Why does assignment in Go create a copy? - go

I will clarify the question a bit. I have read (almost completely) the Go specification, FAQ, Effective Go, and, of course, Tour of Go.
I know that Go is a "pass by value" language and even managed to reason about this behavior and understand all the implications.
All assignments in Go also create copies. In some cases, it's just a value, in some -- a pointer. For some data structures, it's a bit trickier in that the whole structure is copied and might include an implicit pointer to another data structure.
The question is: what in the language specification says explicitly that assignments always create copies?
I feel like it doesn't even need to be mentioned once you understand that there are no references in Go, but the section on assignment statements in the specification doesn't even mention the pass-by-value semantics.
I feel like there must be something in the documentation that describes the behavior in detail, and I, due to lack of some foundational misunderstanding, fail to realize the explanation is there.

What in the language specification says explicitly that assignment always creates copies?
Nothing explicit, but you can maybe deduce this from Variables, which nicely addresses also the case of function signatures:
A variable declaration or, for function parameters and results, the signature of a function declaration or function literal reserves storage for a named variable.
If storage is reserved, when later you assign the result of a unary expression to it — e.g. another variable —, then it must be a copy, otherwise you would have memory aliasing. Which is what Dave Cheney is talking about in There is no pass-by-reference in Go.
Unlike C++, each variable defined in a Go program occupies a unique memory location.
This also has one more important implication, which is the zero value. If you don't provide an expression to initialize a variable in its declaration, the storage reserved for it is given the zero value as default value.

Without going into too much detail, these excerpts from the spec should provide some clarity:
A variable is a storage location for holding a value.
A variable's value [...] is the most recent value assigned to the variable.
At the language level, defining "copying" of values isn't really necessitated. The important implication of copying as we commonly understand it, is that modifying the "copied to" value will not alter the "copied from" value *. This property can be inferred from the above quotations.
(important to note here that "value" and "data structure" are not the same thing. A data structure may be comprised of multiple values.)

The spec actually explicitly talks about this here:
https://golang.org/ref/spec#Calls
In particular:
After they are evaluated, the parameters of the call are passed by value to the function and the called function begins execution. The return parameters of the function are passed by value back to the caller when the function returns.
About assignments: All assignments in most of the languages I know of create copies of values (see Python exception in the comments). A copy of the value on the RHS is assigned to the LHS. If the RHS is a pointer, then a copy of that pointer is assigned to the LHS.

Related

How to tell when a variable goes out of scope?

In doing reference counting, one of the tasks is to "decrement the counter when the variable goes out of scope". But my biggest problem is I can't tell in my head when a variable goes out of scope, at the implementation level of implementing a reference counter.
Could one explain all (or the main) ways in which a variable can go out of scope?
I am specifically talking about in the case of a highly advanced programming language, not a toy / introductory undergraduate language. I am thinking like with JavaScript or Rust, which has closures and nested function definitions (at least in the case of JavaScript). Also when you are using pointers and such and using mutable function parameters. Say you pass in a mutable value to a function, then return a closure using that mutable value, stuff like that.
What are all the ways you can tell when a variable goes out of scope? How do I get this organized enough so I can add it to a reference counter?
A local variable goes out of scope when execution reaches the end of the block in which it was declared.
Variables that are global / static don't ever go out of scope.
Variables that are fields of a composite data type (an class / object, a struct / record, an array, etc) may not have a "scope" per se, but if they do, it is determined by the scope of the composite data type instance they are part of.
If you are trying to analyses this at compile time ... you use a symbol table. This is covered in textbooks on compiler writing.

Is there specification for Go's lifetime model of allocated memory?

Go uses escape analysis and garbage collection to manage memory allocation on stack and heap. Go's FAQ also says:
How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don't need to know. Each variable in Go exists as long as there are references to it. The storage location chosen by the implementation is irrelevant to the semantics of the language.
So Go allocates a memory for a variable, and reserves it at least until it's needed.
My question is: Is this (abstract) behavior written in The Go Programming Language Specification? I found the allocation part is written, for example, in Allocation section:
The built-in function new takes a type T, allocates storage for a variable of that type at run time, and returns a value of type *T pointing to it.
But is there any description of the reservation part? Can we confirm the fact "Each variable in Go exists as long as there are references to it"? If not, is there any reasons?
For example, I want to confirm the following program must not throw SIGSEGV or similar exceptions if a Go compiler has no bugs.
func foo() *int {
x := 42
return &x
}
func main() {
px := foo()
fmt.Println(*px)
}
To be more precise, I expected that the two part, "Go allocates memory when new or something" and "Go reserves the allocated memory at least it's needed", should be written in the specification. I don't care about its implementation details, even though https://github.com/golang/go uses escape analysis and garbage collection.
If the latter part does not exist, then in an extreme case, it's valid implementation according to the spec that the memory is un-allocated immediately after it is allocated. But this is ridiculous, so I think the spec should invalidate that.
Edit for close: I don't think this question is opinion-based. This question is a simple yes/no-question, asking for the description in the specification. The reason for the existence/non-existence can be answered with citations. If not, please show/comment which points are opinion-based. I'll improve that.
The specification uses the term variable for storage location. The specification does not distinguish between storage locations on the heap or the stack. The terms heap and stack are absent from specification.
The section on variables says:
A variable's value is retrieved by referring to the variable in an expression; it is the most recent value assigned to the variable. If a variable has not yet been assigned a value, its value is the zero value for its type.
If a variable can be referenced, then the variable's value can be retrieved. The compiler and runtime must retain a variable's value when there are extant references to the variable.
But is there any description of the preservation part? Can we confirm the fact "Each variable in Go exists as long as there are references to it"? If not, is there any reasons?
Not in the language specification, no; that is a quality of the runtime, not the language. We can confirm the fact that memory is not collected as long as there are references to it by simply observing that Go programs actually work. If that assumption were not valid, most of the standard library would be invalid, along with pretty much all code written by any Go developer. The Go compiler's escape analysis and garbage collector definitely work.
The FAQ entry you found is canonical and can be relied upon, same as the spec.
The thing that you could imagine would cause an issue is the *px in the main function. If the thing pointed to by px does not exist any more. However, According to this section: https://golang.org/ref/spec#Address_operators
For an operand x of pointer type *T, the pointer indirection *x denotes the variable of type T pointed to by x. If x is nil, an attempt to evaluate *x will cause a run-time panic.
This basically says that an implementation of Go is bound to give you the value pointed to, unless the pointer is nil in which case it will panic. The specification does not say how an implementation is to do this, but you can count on any implementation of Go doing this some how.
This matches what your first quote says.

Is there a meaningful difference between pass_by_reference vs pass_by_object_sharing in ruby?

Context: i argue that saying pass_by_reference when it's really pass_by_sharing is misleading
Here is the excerpt from the book "Effective Ruby" I'm arguing against
"Most objects are passed around as references and not as actual values. When these types of objects are inserted into a container the collection class is actually storing a reference to the object and not the object itself. (The notable exception to the rule is the Fixnum class whose objects are always passed by value and not by reference.)
The same is true when objects are passed as method arguments. The method will receive a reference to the object and not a new copy. This is great for efficiency but has a startling implication.
"
The 'call by value' and 'call by object sharing' terminology matches Ruby's behavior, and
the terminology is consistent with other object orientated languages that have the same
semantics.
'Call by value' and 'call by object sharing' basically mean the same thing in object orientated languages, so which one is used doesn't really matter. Someone just thought it would clarify the confusion in the terminology to add more terminology.
If 'call by reference' was implemented in Ruby though, it would be something like:
def f(byref x)
x = "CHANGED"
end
x = ""
f(x)
# X is "CHANGED"
Here, the value of x is changed. The value being which object x refers to.
Using terms 'call by reference' just creates confusion though because they mean
different things to different people. It's unnecessary in
languages like Ruby because you don't have a choice. In languages with different
calling mechanisms like C++ and C# it makes more sense to teach these terms because
they have a real effect on programs and we can come up with non hypothetical examples
of them.
When explaining parameters in Ruby, you don't need to use any of these terms though.
They're meaningless to people that don't already know the language. Just
describe the behavior itself without that terminology and avoid the baggage.
I would say if you insist on using these terms, then use 'call by value' because it's usually considered more correct. The 'Programming Ruby' book calls it 'call by value', as well as plenty of Ruby programmers. Using the term with a different meaning than its technical one isn't helpful.
You are right. Ruby is pass-by-value only. The semantics of passing and assigning in Ruby are exactly identical to those in Java. And Java is universally described (on Stack Overflow and the rest of the Internet) as pass-by-value only. Terms about languages such as pass-by-value and pass-by-reference must be consistently used across languages to be meaningful.
The thing that is often misunderstood by people who say Java, Ruby, etc. "pass objects by reference" is that "objects" are not values in these languages, and thus cannot be "passed". The value of every variable and result of every expression is a "reference", which is a pointer to an object. The expression for creating an object returns an object pointer; when you access an attribute through the dot notation, the left side takes an object pointer; when you assign one variable to another, you copy the pointer resulting in two pointers to the same object. You always deal with pointers to objects, never objects themselves.
This is made explicit in Java as the only types in Java are primitive types and reference types -- there are no "object types". So every value in Java that is not a primitive is a reference (a pointer to an object). Ruby is dynamically-typed, so variables don't have explicit types. But you can imagine a dynamically-typed language as just a statically-typed language having exactly one type; and for languages like Python and Ruby, if this type were described, it be a pointer-to-object type.
The issue ultimately boils down to a problem of definitions. People argue over things because there is no precise definition, or they each have slightly different definitions. Rather then argue over vaguely-defined things like what is the "value" of a variable, or whether named values are "variables" or "names", etc., we need to use a definition for pass-by-value and pass-by-reference that is based purely on semantics of a language structure. #fgb's answer provides a clear semantic test for pass-by-reference. In "true pass-by-reference", e.g. with & in C++ and PHP, or with ref or out in C#, simple assignment (i.e. =) to a parameter variable has the same effect as simple assignment to the passed variable in the original scope. In pass-by-value, simple assignment (i.e. =) to a parameter variable has no effect in the original scope. This is what we see in Java, Python, Ruby, and many other languages.
I dislike people coming up with new names like "pass by object sharing", when they don't understand that the semantics are covered by an existing term, pass-by-value. Adding a new term only adds more to the confusion rather than reduce it, because it does not resolve the definitions of existing terms, only adding a new term that also needs to be defined.

Move Semantics in Golang

This from Bjarne Stroustrup's The C++ Programming Language, Fourth Edition 3.3.2.
We didn’t really want a copy; we just wanted to get the result out of
a function: we wanted to move a Vector rather than to copy it.
Fortunately, we can state that intent:
class Vector {
// ...
Vector(const Vector& a); // copy constructor
Vector& operator=(const Vector& a); // copy assignment
Vector(Vector&& a); // move constructor
Vector& operator=(Vector&& a); // move assignment
};
Given that definition, the compiler will choose the move constructor
to implement the transfer of the return value out of the function.
This means that r=x+y+z will involve no copying of Vectors. Instead,
Vectors are just moved.As is typical, Vector’s move constructor is
trivial to define...
I know Golang supports traditional passing by value and passing by reference using Go style pointers.
Does Go support "move semantics" the way C++11 does, as described by Stroustrup above, to avoid the useless copying back and forth? If so, is this automatic, or does it require us to do something in our code to make it happen.
Note: A few answers have been posted - I have to digest them a bit, so I haven't accepted one yet - thanks.
The breakdown is like here:
Everything in Go is passed by value.
But there are five built-in "reference types" which are passed by value as well but internally they hold references to separately maintained data structure: maps, slices, channels, strings and function values (there is no way to mutate the data the latter two reference).
Your own answer, #Vector, is incorrect is that nothing in Go is passed by reference. Rather, there are types with reference semantics. Values of them are still passed by value (sic!).
Your confusion suppsedly stems from the fact your mind is supposedly currently burdened by C++, Java etc while these things in Go are done mostly "as in C".
Take arrays and slices for instance. An array is passed by value in Go, but a slice is a packed struct containing a pointer (to an underlying array) and two platform-sized integers (the length and the capacity of the slice), and it's the value of this structure which is copied — a pointer and two integers — when it's assigned or returned etc. Should you copy a "bare" array, it would be copied literally — with all its elements.
The same applies to channels and maps. You can think of types defining channels and maps as declared something like this:
type Map struct {
impl *mapImplementation
}
type Slice struct {
impl *sliceImplementation
}
(By the way, if you know C++, you should be aware that some C++ code uses this trick to lower exposure of the details into header files.)
So when you later have
m := make(map[int]string)
you could think of it as m having the type Map and so when you later do
x := m
the value of m gets copied, but it contains just a single pointer, and so both x and m now reference the same underlying data structure. Was m copied by reference ("move semantics")? Surely not! Do values of type map and slice and channel have reference semantincs? Yes!
Note that these three types of this kind are not at all special: implementing your custom type by embedding in it a pointer to some complicated data structure is a rather common pattern.
In other words, Go allows the programmer to decide what semantics they want for their types. And Go happens to have five built-in types which have reference semantics already (while all the other built-in types have value semantics). Picking one semantics over the other does not affect the rule of copying everything by value in any way. For instance, it's fine to have pointers to values of any kind of type in Go, and assign them (so long they have compatible types) — these pointers will be copied by value.
Another angle to look at this is that many Go packages (standard and 3rd-party) prefer to work with pointers to (complex) values. One example is os.Open() (which opens a file on a filesystem) returning a value of the type *os.File. That is, it returns a pointer and expects the calling code to pass this pointer around. Surely, the Go authors might have declared os.File to be a struct containing a single pointer, essentially making this value have reference semantics but they did not do that. I think the reason for this is that there's no special syntax to work with the values of this type so there's no reason to make them work as maps, channels and slices. KISS, in other words.
Recommended reading:
"Go Data Structures"
"Go Slices: Usage and Internals"
Arrays, slices (and strings): The mechanics of 'append'"
A thead on golang-nuts — pay close attention to the reply by Rob Pike.
The Go Programming Language Specification
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
In Go, everything is passed by value.
Rob Pike
In Go, everything is passed by value. Everything.
There are some types (pointers, channels, maps, slices) that have
reference-like properties, but in those cases the relevant data
structure (pointer, channel pointer, map header, slice header) holds a
pointer to an underlying, shared object (pointed-to thing, channel
descriptor, hash table, array); the data structure itself is passed by
value. Always.
Always.
-rob
It is my understanding that Go, as well as Java and C# never had the excessive copying costs of C++, but do not solve ownership transference to containers. Therefore there is still copying involved. As C++ becomes more of a value-semantics language, with references/pointers being relegated to i) smart-pointer managed objects inside classes and ii) dependence references, move semantics solves the problem of excessive copying. Note that this has nothing to do with "pass by value", nowadays everyone passes objects by Reference (&) or Const Reference (const &) in C++.
Let's look at this (1) :
BigObject BO(big,stuff,inside);
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BO);
Or (2)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
Although you're passing by reference to the vector vo, in C++03 there was a copy inside the vector code.
In the second case, there is a temporary object that has to be constructed and then is copied inside the vector. Since it can only be accessed by the vector, that is a wasteful copy.
However, in the first case, our intent could be just to give control of BO to the vector itself. C++17 allows this:
(1, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.emplace_back(big,stuff,inside);
Or (2, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
From what I've read, it is not clear that Java, C# or Go are exempt from the same copy duplication that C++03 suffered from in the case of containers.
The old-fashioned COW (copy-on-write) technique, also had the same problems, since the resources will be copied as soon as the object inside the vector is duplicated.
Stroustrup is talking about C++, which allows you to pass containers, etc by value - so the excessive copying becomes an issue.
In Go, (like in Delphi, Java, etc) when you pass a container type, etc they are always references, so it's a non-issue. Regardless, you don't have to deal with it or worry about in GoLang - the compiler just does what it needs to do, and from what I've seen thus far, it's doing it right.
Tnx to #KerrekSB for putting me on the right track.
#KerrekSB - I hope this is the right answer. If it's wrong, you bear no responsibility.:)

Ruby Terminology Question: Is this a Ruby declaration, definition and assignment, all at the same time?

If I say:
x = "abc"
this seems like a declaration, definition and assignment, all at the same time, regardless of whether I have said anything about x in the program before.
Is this correct?
I'm not sure what the correct terminology is in Ruby for declarations, definitions and assigments or if there is even a distinction between these things because of the dynamic typing in Ruby.
#tg: Regarding your point # 2: even if x existed before the x = "abc" statement, couldn't you call the x = "abc" statement a definition/re-definition?
Declaration: No.
It doesn't make sense to talk about declaring variables in Ruby, because there's nothing analogous to a declaration in the languages. Languages designed for compilers have declarations because the compiler needs to know in advance how big datatypes are and how to access different parts of them. e.g., if I say in C:
int *i;
then the compiler knows that somewhere there is some memory set aside for i, and it's as big as it needs to be to hold a pointer to an int. Eventually the linker will hook all the references to i together, but at least the compiler knows it's out there somewhere.
Definition: Probably.
A definition typically set an initial value for something (at least in the familiar compiled languages). If x didn't exist before the x = "abc" statement, then I guess you could call this a definition, since that is when Ruby has to assign a value to the symbol x.
Again, though, definition is a specific term that people typically use to distinguish the initial, static assignment of a value to some variable from that variable's declaration. In Ruby, you don't have that kind of statement. You typically just say a variable is defined if it's been assigned a value somewhere in your current scope, and you say it's undefined if it hasn't.
You usually don't talk about it having a definition, because in Ruby that just amounts to assignment. There's no special context that would justify you saying definition like there is in other languages.
Which brings us to...
Assignment: Yes.
You can definitely call this an assignment, since it is assigning a value to the symbol x. I don't think anyone will disagree with that.
Pretty much. And if, on the very next line, you do:
x = 1
Then you've just re-defined it, as well as assigned it (its now an integer, not a string). Duck typing is very different to what you're probably used to.

Resources