How to tell when a variable goes out of scope? - algorithm

In doing reference counting, one of the tasks is to "decrement the counter when the variable goes out of scope". But my biggest problem is I can't tell in my head when a variable goes out of scope, at the implementation level of implementing a reference counter.
Could one explain all (or the main) ways in which a variable can go out of scope?
I am specifically talking about in the case of a highly advanced programming language, not a toy / introductory undergraduate language. I am thinking like with JavaScript or Rust, which has closures and nested function definitions (at least in the case of JavaScript). Also when you are using pointers and such and using mutable function parameters. Say you pass in a mutable value to a function, then return a closure using that mutable value, stuff like that.
What are all the ways you can tell when a variable goes out of scope? How do I get this organized enough so I can add it to a reference counter?

A local variable goes out of scope when execution reaches the end of the block in which it was declared.
Variables that are global / static don't ever go out of scope.
Variables that are fields of a composite data type (an class / object, a struct / record, an array, etc) may not have a "scope" per se, but if they do, it is determined by the scope of the composite data type instance they are part of.
If you are trying to analyses this at compile time ... you use a symbol table. This is covered in textbooks on compiler writing.

Related

Why does assignment in Go create a copy?

I will clarify the question a bit. I have read (almost completely) the Go specification, FAQ, Effective Go, and, of course, Tour of Go.
I know that Go is a "pass by value" language and even managed to reason about this behavior and understand all the implications.
All assignments in Go also create copies. In some cases, it's just a value, in some -- a pointer. For some data structures, it's a bit trickier in that the whole structure is copied and might include an implicit pointer to another data structure.
The question is: what in the language specification says explicitly that assignments always create copies?
I feel like it doesn't even need to be mentioned once you understand that there are no references in Go, but the section on assignment statements in the specification doesn't even mention the pass-by-value semantics.
I feel like there must be something in the documentation that describes the behavior in detail, and I, due to lack of some foundational misunderstanding, fail to realize the explanation is there.
What in the language specification says explicitly that assignment always creates copies?
Nothing explicit, but you can maybe deduce this from Variables, which nicely addresses also the case of function signatures:
A variable declaration or, for function parameters and results, the signature of a function declaration or function literal reserves storage for a named variable.
If storage is reserved, when later you assign the result of a unary expression to it — e.g. another variable —, then it must be a copy, otherwise you would have memory aliasing. Which is what Dave Cheney is talking about in There is no pass-by-reference in Go.
Unlike C++, each variable defined in a Go program occupies a unique memory location.
This also has one more important implication, which is the zero value. If you don't provide an expression to initialize a variable in its declaration, the storage reserved for it is given the zero value as default value.
Without going into too much detail, these excerpts from the spec should provide some clarity:
A variable is a storage location for holding a value.
A variable's value [...] is the most recent value assigned to the variable.
At the language level, defining "copying" of values isn't really necessitated. The important implication of copying as we commonly understand it, is that modifying the "copied to" value will not alter the "copied from" value *. This property can be inferred from the above quotations.
(important to note here that "value" and "data structure" are not the same thing. A data structure may be comprised of multiple values.)
The spec actually explicitly talks about this here:
https://golang.org/ref/spec#Calls
In particular:
After they are evaluated, the parameters of the call are passed by value to the function and the called function begins execution. The return parameters of the function are passed by value back to the caller when the function returns.
About assignments: All assignments in most of the languages I know of create copies of values (see Python exception in the comments). A copy of the value on the RHS is assigned to the LHS. If the RHS is a pointer, then a copy of that pointer is assigned to the LHS.

Does defining a constant consume some memory?

I am using Go to build an application.
I need to define a lot of constants for architectural purpose. Like we have a section named posts and I want to perform some action on it. Its log will be saved in the system with type posts.
Problem
I have around 50 sections like this. And for ease of use of section type I wanted to define section types as constants. But Like variables consumes some space in Go, do the constants too? Should I define them like this for multi purpose or refer the type everywhere with posts string.
What should I follow for my requirement?
Does defining a constant consume some memory?
No.
And yes. Er, sort of, but not really.
Constants are a compile-time concept in Go. That means they don't actually exist while the program is running, so in that sense, no, they don't use memory.
However, typically constants don't exist in a vacuum. They're typically used somewhere in your code. For example:
const defaultName = "Unnamed"
/* then later */
var name = defaultName
Now the variable name is using memory, and it is assigned a value from the constant defaultName. So the constant, itself, is not using memory, per se, but the thing referencing the constant uses memory. You could also create many (potentially thousands or more) variables that all refer to the same constant, and would thus use many times more memory.
Generally, you can imagine that every constant is replaced with its literal value. If that literal value would "consume memory", then memory is consumed.
That is, this is equivalent to the above code snippet:
var name = "Unnamed"
So, the constant uses (or doesn't) the same memory that a literal value uses (or doesn't).

What is the best way to manage a large quantity of constants

I am currently working on a very complex program that processes rows from an input table and has a huge number of possible outcomes for each record. Because of this I have a very large number of constants defined for the outcome messages. There is one success message for the record, but a multitude of possible warnings and errors.
My first thought was to define all of my constants for these messages at the package body level, but then I decided to move each constant to the procedure where it is used. I'm now second guessing that decision and thinking of moving everything back to package body level. What is the best way to define this many constants? Ease of maintainability is my ultimate goal for this program since it is so complex.
I think this is a matter of taste. In my application I put all error codes into an Error-Package. All main and commonly used constants I put into a separate package (without a package body).
Again, a matter of taste, but I tend to put a list of named constants at the package spec level rather than the package body so that they can be referenced by any portion of the application. If I ever want to change the error code that c_err_for_specific_reason_x uses, it becomes a single place to do so.
If I wanted to hide the codes and put them within the body I would have a get_error_code(p_get_error_name varchar) function that did the translation based on you passing a valid constant name.
I've done both on different projects, but tend towards the list over the function most times. I tend to use the function if it a table-driven source of the data.
It ... wait for it ... depends!
Since you currently define your constants in the package body, you don't need them to be publicly accessible outside the package. So defining them in a spec really doesn't buy you anything.
Here's is the rule I follow: Define constants within the smallest scope needed. So if a constant is used only within one procedure, define it in that procedure. If it is used within more than one procedure, define it in the body. If it is used elsewhere by code in other packages (or non-packaged SPs) but only when using a particular package, define it in the spec of that package. If it is used by other code for general use, put it in a separate spec of such general constants.

Is there a meaningful difference between pass_by_reference vs pass_by_object_sharing in ruby?

Context: i argue that saying pass_by_reference when it's really pass_by_sharing is misleading
Here is the excerpt from the book "Effective Ruby" I'm arguing against
"Most objects are passed around as references and not as actual values. When these types of objects are inserted into a container the collection class is actually storing a reference to the object and not the object itself. (The notable exception to the rule is the Fixnum class whose objects are always passed by value and not by reference.)
The same is true when objects are passed as method arguments. The method will receive a reference to the object and not a new copy. This is great for efficiency but has a startling implication.
"
The 'call by value' and 'call by object sharing' terminology matches Ruby's behavior, and
the terminology is consistent with other object orientated languages that have the same
semantics.
'Call by value' and 'call by object sharing' basically mean the same thing in object orientated languages, so which one is used doesn't really matter. Someone just thought it would clarify the confusion in the terminology to add more terminology.
If 'call by reference' was implemented in Ruby though, it would be something like:
def f(byref x)
x = "CHANGED"
end
x = ""
f(x)
# X is "CHANGED"
Here, the value of x is changed. The value being which object x refers to.
Using terms 'call by reference' just creates confusion though because they mean
different things to different people. It's unnecessary in
languages like Ruby because you don't have a choice. In languages with different
calling mechanisms like C++ and C# it makes more sense to teach these terms because
they have a real effect on programs and we can come up with non hypothetical examples
of them.
When explaining parameters in Ruby, you don't need to use any of these terms though.
They're meaningless to people that don't already know the language. Just
describe the behavior itself without that terminology and avoid the baggage.
I would say if you insist on using these terms, then use 'call by value' because it's usually considered more correct. The 'Programming Ruby' book calls it 'call by value', as well as plenty of Ruby programmers. Using the term with a different meaning than its technical one isn't helpful.
You are right. Ruby is pass-by-value only. The semantics of passing and assigning in Ruby are exactly identical to those in Java. And Java is universally described (on Stack Overflow and the rest of the Internet) as pass-by-value only. Terms about languages such as pass-by-value and pass-by-reference must be consistently used across languages to be meaningful.
The thing that is often misunderstood by people who say Java, Ruby, etc. "pass objects by reference" is that "objects" are not values in these languages, and thus cannot be "passed". The value of every variable and result of every expression is a "reference", which is a pointer to an object. The expression for creating an object returns an object pointer; when you access an attribute through the dot notation, the left side takes an object pointer; when you assign one variable to another, you copy the pointer resulting in two pointers to the same object. You always deal with pointers to objects, never objects themselves.
This is made explicit in Java as the only types in Java are primitive types and reference types -- there are no "object types". So every value in Java that is not a primitive is a reference (a pointer to an object). Ruby is dynamically-typed, so variables don't have explicit types. But you can imagine a dynamically-typed language as just a statically-typed language having exactly one type; and for languages like Python and Ruby, if this type were described, it be a pointer-to-object type.
The issue ultimately boils down to a problem of definitions. People argue over things because there is no precise definition, or they each have slightly different definitions. Rather then argue over vaguely-defined things like what is the "value" of a variable, or whether named values are "variables" or "names", etc., we need to use a definition for pass-by-value and pass-by-reference that is based purely on semantics of a language structure. #fgb's answer provides a clear semantic test for pass-by-reference. In "true pass-by-reference", e.g. with & in C++ and PHP, or with ref or out in C#, simple assignment (i.e. =) to a parameter variable has the same effect as simple assignment to the passed variable in the original scope. In pass-by-value, simple assignment (i.e. =) to a parameter variable has no effect in the original scope. This is what we see in Java, Python, Ruby, and many other languages.
I dislike people coming up with new names like "pass by object sharing", when they don't understand that the semantics are covered by an existing term, pass-by-value. Adding a new term only adds more to the confusion rather than reduce it, because it does not resolve the definitions of existing terms, only adding a new term that also needs to be defined.

Move Semantics in Golang

This from Bjarne Stroustrup's The C++ Programming Language, Fourth Edition 3.3.2.
We didn’t really want a copy; we just wanted to get the result out of
a function: we wanted to move a Vector rather than to copy it.
Fortunately, we can state that intent:
class Vector {
// ...
Vector(const Vector& a); // copy constructor
Vector& operator=(const Vector& a); // copy assignment
Vector(Vector&& a); // move constructor
Vector& operator=(Vector&& a); // move assignment
};
Given that definition, the compiler will choose the move constructor
to implement the transfer of the return value out of the function.
This means that r=x+y+z will involve no copying of Vectors. Instead,
Vectors are just moved.As is typical, Vector’s move constructor is
trivial to define...
I know Golang supports traditional passing by value and passing by reference using Go style pointers.
Does Go support "move semantics" the way C++11 does, as described by Stroustrup above, to avoid the useless copying back and forth? If so, is this automatic, or does it require us to do something in our code to make it happen.
Note: A few answers have been posted - I have to digest them a bit, so I haven't accepted one yet - thanks.
The breakdown is like here:
Everything in Go is passed by value.
But there are five built-in "reference types" which are passed by value as well but internally they hold references to separately maintained data structure: maps, slices, channels, strings and function values (there is no way to mutate the data the latter two reference).
Your own answer, #Vector, is incorrect is that nothing in Go is passed by reference. Rather, there are types with reference semantics. Values of them are still passed by value (sic!).
Your confusion suppsedly stems from the fact your mind is supposedly currently burdened by C++, Java etc while these things in Go are done mostly "as in C".
Take arrays and slices for instance. An array is passed by value in Go, but a slice is a packed struct containing a pointer (to an underlying array) and two platform-sized integers (the length and the capacity of the slice), and it's the value of this structure which is copied — a pointer and two integers — when it's assigned or returned etc. Should you copy a "bare" array, it would be copied literally — with all its elements.
The same applies to channels and maps. You can think of types defining channels and maps as declared something like this:
type Map struct {
impl *mapImplementation
}
type Slice struct {
impl *sliceImplementation
}
(By the way, if you know C++, you should be aware that some C++ code uses this trick to lower exposure of the details into header files.)
So when you later have
m := make(map[int]string)
you could think of it as m having the type Map and so when you later do
x := m
the value of m gets copied, but it contains just a single pointer, and so both x and m now reference the same underlying data structure. Was m copied by reference ("move semantics")? Surely not! Do values of type map and slice and channel have reference semantincs? Yes!
Note that these three types of this kind are not at all special: implementing your custom type by embedding in it a pointer to some complicated data structure is a rather common pattern.
In other words, Go allows the programmer to decide what semantics they want for their types. And Go happens to have five built-in types which have reference semantics already (while all the other built-in types have value semantics). Picking one semantics over the other does not affect the rule of copying everything by value in any way. For instance, it's fine to have pointers to values of any kind of type in Go, and assign them (so long they have compatible types) — these pointers will be copied by value.
Another angle to look at this is that many Go packages (standard and 3rd-party) prefer to work with pointers to (complex) values. One example is os.Open() (which opens a file on a filesystem) returning a value of the type *os.File. That is, it returns a pointer and expects the calling code to pass this pointer around. Surely, the Go authors might have declared os.File to be a struct containing a single pointer, essentially making this value have reference semantics but they did not do that. I think the reason for this is that there's no special syntax to work with the values of this type so there's no reason to make them work as maps, channels and slices. KISS, in other words.
Recommended reading:
"Go Data Structures"
"Go Slices: Usage and Internals"
Arrays, slices (and strings): The mechanics of 'append'"
A thead on golang-nuts — pay close attention to the reply by Rob Pike.
The Go Programming Language Specification
Calls
In a function call, the function value and arguments are evaluated in
the usual order. After they are evaluated, the parameters of the call
are passed by value to the function and the called function begins
execution. The return parameters of the function are passed by value
back to the calling function when the function returns.
In Go, everything is passed by value.
Rob Pike
In Go, everything is passed by value. Everything.
There are some types (pointers, channels, maps, slices) that have
reference-like properties, but in those cases the relevant data
structure (pointer, channel pointer, map header, slice header) holds a
pointer to an underlying, shared object (pointed-to thing, channel
descriptor, hash table, array); the data structure itself is passed by
value. Always.
Always.
-rob
It is my understanding that Go, as well as Java and C# never had the excessive copying costs of C++, but do not solve ownership transference to containers. Therefore there is still copying involved. As C++ becomes more of a value-semantics language, with references/pointers being relegated to i) smart-pointer managed objects inside classes and ii) dependence references, move semantics solves the problem of excessive copying. Note that this has nothing to do with "pass by value", nowadays everyone passes objects by Reference (&) or Const Reference (const &) in C++.
Let's look at this (1) :
BigObject BO(big,stuff,inside);
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BO);
Or (2)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
Although you're passing by reference to the vector vo, in C++03 there was a copy inside the vector code.
In the second case, there is a temporary object that has to be constructed and then is copied inside the vector. Since it can only be accessed by the vector, that is a wasteful copy.
However, in the first case, our intent could be just to give control of BO to the vector itself. C++17 allows this:
(1, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.emplace_back(big,stuff,inside);
Or (2, C++17)
vector<BigObject> vo;
vo.reserve(1000000);
vo.push_back(BigObject(big,stuff,inside));
From what I've read, it is not clear that Java, C# or Go are exempt from the same copy duplication that C++03 suffered from in the case of containers.
The old-fashioned COW (copy-on-write) technique, also had the same problems, since the resources will be copied as soon as the object inside the vector is duplicated.
Stroustrup is talking about C++, which allows you to pass containers, etc by value - so the excessive copying becomes an issue.
In Go, (like in Delphi, Java, etc) when you pass a container type, etc they are always references, so it's a non-issue. Regardless, you don't have to deal with it or worry about in GoLang - the compiler just does what it needs to do, and from what I've seen thus far, it's doing it right.
Tnx to #KerrekSB for putting me on the right track.
#KerrekSB - I hope this is the right answer. If it's wrong, you bear no responsibility.:)

Resources