Embedded instance variables in Ruby 1.9? - ruby

the new object structure in 1.9 embeds some ivars into objects for faster access:
#define ROBJECT_EMBED_LEN_MAX 3
struct RObject {
struct RBasic basic;
union {
struct {
long numiv;
VALUE *ivptr;
struct st_table *iv_index_tbl;
} heap;
VALUE ary[ROBJECT_EMBED_LEN_MAX];
} as;
};
My question is are the first 3 ivars always embedded? or are they only embedded if the number of ivars is <=3 ?
I've tried reading the source but find it next to incomprehensible.
Thanks

The instance variable heap (called heap) and the embedded instance variables (called ary) are contained in a union. You'll also find some macros defined below the snippit you quoted that all look like:
#define ROBJECT_IVPTR(o) \
((RBASIC(o)->flags & ROBJECT_EMBED) ? \
ROBJECT(o)->as.ary : \
ROBJECT(o)->as.heap.ivptr)
Key in all these is RBASIC(o)->flags & ROBJECT_EMBED. The ROBJECT_EMBED flag indicates whether embedded instance variables are in use, or the heap is in use.
So embedded variables are only used when the number of instance variables is <= 3.

Related

Why does delegating to the default constructor not zero initialize member variable

#include <string>
struct T1 {
int _mem1;
int _mem2;
T1() = default;
T1(int mem2) : T1() { _mem2 = mem2; }
};
T1 getT1() { return T1(); }
T1 getT1(int mem2) { return T1(mem2); }
int main() {
volatile T1 a = T1();
std::printf("a._mem1=%d a._mem2=%d\n", a._mem1, a._mem2);
volatile T1 b = T1(1);
std::printf("b._mem1=%d b._mem2=%d\n", b._mem1, b._mem2);
// Temporarily disable
if (false) {
volatile T1 c = getT1();
std::printf("c._mem1=%d c._mem2=%d\n", c._mem1, c._mem2);
volatile T1 d = getT1(1);
std::printf("d._mem1=%d d._mem2=%d\n", d._mem1, d._mem2);
}
}
When I compile this with gcc5.4, I get the following output:
g++ -std=c++11 -O3 test.cpp -o test && ./test
a._mem1=0 a._mem2=0
b._mem1=382685824 b._mem2=1
Why does the user defined constructor, which delegates to the default constructor not manage to set _mem1 to zero for b, however a which uses the default constructor is zero initialized?
Valgrind confirms this also:
==12579== Conditional jump or move depends on uninitialised value(s)
==12579== at 0x4E87CE2: vfprintf (vfprintf.c:1631)
==12579== by 0x4E8F898: printf (printf.c:33)
==12579== by 0x4005F3: main (in test)
If I change if(false) to if(true)
Then the output is as you would expect
a._mem1=0 a._mem2=0
b._mem1=0 b._mem2=1
c._mem1=0 c._mem2=0
d._mem1=0 d._mem2=1
What is the compiler doing?
Short answer: for trivial types, the two distinct forms of "default construction" leads to two different initializations:
T a; in which case the object is default-initialized. Its value is undetermined and undefined behavior will soon happen (this is how is initialized b.mem1 and why valgrind detect an error.)
T a=T(); in which case the object is value-initialized and its entire memory is zeroed (this is what happens to a.mem1 and a.mem2)
Long answer: Actualy, the default constructor of T1 is not the cause of zero initialization of a.mem1. a has been first zero-initialized but not b because of a singular rule of the standard that does not apply for b's initializer.
The definition volatile a=T() causes a to be value-initialized (1). struct T1 as no user-provided default constructor (2). For such a struct the entire object is zero-initialized as stated by this rule of the C++11 standard [dcl.init]/7.2:
if T is a (possibly cv-qualified) non-union class type without a user-provided constructor, then the object is zero-initialized and, if T's implicitly-declared default constructor is non-trivial, that constructor is called.
There is a subtle difference between C++11 and C++17 that causes the definition volatile b=T(1) to be undefined behavior in C++11 but not in C++17. In C++11, b is initialized by copying an object type T1 which is initialized by the expression T(1). This copy construction evaluate T(1).mem1 which is an undetermined value. This is forbidden. In c++17, b is directly initialized by the prvalue expression T(1).
The evaluation of this undetermined value inside the printf is also undefined behavior independently of the c++ standard. This is why valgrind complains and why you see inconsistent outputs when you change if (true) to if (false).
(1) strictly speaking a is copy constructed from a value-initalized object in c++11
(2) T1's default constructor is not user provided because it is defined as defaulted on the first declaration
Short Answer
The default constructor in your code is considered trivial and that kind of constructor perform no actions i.e. leave things unitialized.
Longer answer
Trivial default constructor
The default constructor for class T is trivial (i.e. performs no
action) if all of the following is true:
The constructor is not user-provided (i.e., is implicitly-defined or defaulted on its first declaration)
T has no virtual member functions
T has no virtual base classes
T has no non-static members with default initializers.
(since C++11)
Every direct base of T has a trivial default constructor
Every non-static member of class type has a trivial default constructor
> A trivial default constructor is a constructor that performs no
action. All data types compatible with the C language (POD types) are
trivially default-constructible. Unlike in C, however, objects with
trivial default constructors cannot be created by simply
reinterpreting suitably aligned storage, such as memory allocated with
std::malloc: placement-new is required to formally introduce a new
object and avoid potential undefined behavior.
http://www.enseignement.polytechnique.fr/informatique/INF478/docs/Cpp/en/cpp/language/default_constructor.html

Synonym function call for ::new((void*)p)T(value);

I am trying to write my own Allocator which can be used in STL. That far I could almost successfully finish but with one function I have my problem:
The Allocator used in STL should provide the function construct [example from a standard allocator using new & delete]:
// initialize elements of allocated storage p with value value
void construct (T* p, const T& value)
{
::new((void*)p)T(value);
}
I am stuck how to rewrite this using my own function which replaces the new keyword initializing it with the value.
This function construct is for example used in this code: fileLines.push_back( fileLine );
where
MyVector<MyString> fileLines;
MyString fileLine;
These are my typedefs where I use my own Allocator:
template <typename T> using MyVector = std::vector<T, Allocator<T>>;
using MyString = std::basic_string<char, std::char_traits<char>, Allocator<char>>;
I am confused because here is allocated a pointer to T which can be for example [when I understood it correctly] MySstring.
Do I understand it correctly that the pointer - allocated by new - will have 10 bytes, when value is 123456789 and then the provided value is copied to the new pointer?
My question:
How to rewrite the one line of code using my own function? For me the difficult point is how to get the length of value [which can have any type] in order I can correctly determinate the length of the allocated block and how to copy it in order it works for all possible types T?
The new operator in the construct function does not allocate anything at all, it's a placement new call, which takes an already allocated chunk of memory (which needs to have been previously allocated some way, and at least as large as sizeof(T)) and initializes it as a T object by calling the constructor of T, pretending that the memory pointed to by p is a T object.
::new int(7) calls the default new operator, gets some memory big enough for an int, and constructs an int with the value 7 in it.
::new(ptr) int(7) takes a void* called ptr, and constructs an int with the value 7 in it. This is called "placement new".
The important part is what is missing from the second paragraph. It does not create space, but rather constructs an object in some existing space.
It is like saying ptr->T::T() ptr->int::int(7) where we "call the constructorofinton the memory location ofptr, exceptptr->T::T()orptr->int::int(7)are not valid syntaxes. Placementnew` is the way to explicitly call a constructor in C++.
Similarly, ptr->~T() or ptr->~int() will call the destructor on the object located at ptr (however, there is no ~int so that is an error, unless int is a template or dependent type, in which case it is a pseudo-destructor and the call it ignored instead of generating an error).
It is very rare that you want to change construct from the default implementation in an allocator. You might do this if your allocator wants to track creation of objects and attach the information about their arguments, without intrusively modifying the constructor. But this is a corner case.

Uniform initialization behavior different for different types in vector

I came across this article in which I read this example by one of the posters. I have quoted that here for convenience.
struct Foo
{
Foo(int i) {} // #1
Foo() {}
};
int main()
{
std::vector<Foo> f {10};
std::cout << f.size() << std::endl;
}
The above code, as written, emits “1” (10 is a converted to Foo by a
constructor that takes an int, then the vector’s initializer_list
constructor is called). If I comment out the line commented as #1, the
result is “10” (the initializer_list cannot be converted so the int
constructor is used).
My question is why does it emit a 10 if the int constructor is removed.
I understand that uniform initialization list works in the following order
1-Calls the initializer list if available or possible
2-Calls the default constructor if available
3-Does aggregate initialization
In the above case why is it creating 10 items in the vector since 1,2 and 3 are not possible ? Does this mean with uniform initialization a vector of items might always have different behaviors ?
Borrowing a quote from Scott Meyers in Effective Modern C++ (emphasis in original):
If, however, one or more constructors declare a parameter of type std::initializer_list, calls using the braced initialization syntax strongly prefer the overloads taking std;:initializer_lists. Strongly. If there's any way for compilers to construe a call using a braced initializer to be a constructor taking a std::initializer_list, compilers will employ that interpretation.
So when you have std::vector<Foo> f {10};, it will try to use the constructor of vector<Foo> that takes an initializer_list<Foo>. If Foo is constructible from an int, that is the constructor we're using - so we end up with one Foo constructed from 10.
Or, from the standardese, in [over.match.list]:
When objects of non-aggregate class type T are list-initialized (8.5.4), overload resolution selects the constructor
in two phases:
(1.1) — Initially, the candidate functions are the initializer-list constructors (8.5.4) of the class T and the
argument list consists of the initializer list as a single argument.
(1.2) — If no viable initializer-list constructor is found, overload resolution is performed again, where the
candidate functions are all the constructors of the class T and the argument list consists of the elements
of the initializer list.
If there is a viable initializer-list constructor, it is used. If you didn't have the Foo(int ) constructor, there would not be a viable initializer-list constructor, and overload resolution the second time around would find the constructor of vector that takes a size - and so you'd get a vector of 10 default-constructed Foos instead.

How do the different enum variants work in TypeScript?

TypeScript has a bunch of different ways to define an enum:
enum Alpha { X, Y, Z }
const enum Beta { X, Y, Z }
declare enum Gamma { X, Y, Z }
declare const enum Delta { X, Y, Z }
If I try to use a value from Gamma at runtime, I get an error because Gamma is not defined, but that's not the case for Delta or Alpha? What does const or declare mean on the declarations here?
There's also a preserveConstEnums compiler flag -- how does this interact with these?
There are four different aspects to enums in TypeScript you need to be aware of. First, some definitions:
"lookup object"
If you write this enum:
enum Foo { X, Y }
TypeScript will emit the following object:
var Foo;
(function (Foo) {
Foo[Foo["X"] = 0] = "X";
Foo[Foo["Y"] = 1] = "Y";
})(Foo || (Foo = {}));
I'll refer to this as the lookup object. Its purpose is twofold: to serve as a mapping from strings to numbers, e.g. when writing Foo.X or Foo['X'], and to serve as a mapping from numbers to strings. That reverse mapping is useful for debugging or logging purposes -- you will often have the value 0 or 1 and want to get the corresponding string "X" or "Y".
"declare" or "ambient"
In TypeScript, you can "declare" things that the compiler should know about, but not actually emit code for. This is useful when you have libraries like jQuery that define some object (e.g. $) that you want type information about, but don't need any code created by the compiler. The spec and other documentation refers to declarations made this way as being in an "ambient" context; it is important to note that all declarations in a .d.ts file are "ambient" (either requiring an explicit declare modifier or having it implicitly, depending on the declaration type).
"inlining"
For performance and code size reasons, it's often preferable to have a reference to an enum member replaced by its numeric equivalent when compiled:
enum Foo { X = 4 }
var y = Foo.X; // emits "var y = 4";
The spec calls this substitution, I will call it inlining because it sounds cooler. Sometimes you will not want enum members to be inlined, for example because the enum value might change in a future version of the API.
Enums, how do they work?
Let's break this down by each aspect of an enum. Unfortunately, each of these four sections is going to reference terms from all of the others, so you'll probably need to read this whole thing more than once.
computed vs non-computed (constant)
Enum members can either be computed or not. The spec calls non-computed members constant, but I'll call them non-computed to avoid confusion with const.
A computed enum member is one whose value is not known at compile-time. References to computed members cannot be inlined, of course. Conversely, a non-computed enum member is once whose value is known at compile-time. References to non-computed members are always inlined.
Which enum members are computed and which are non-computed? First, all members of a const enum are constant (i.e. non-computed), as the name implies. For a non-const enum, it depends on whether you're looking at an ambient (declare) enum or a non-ambient enum.
A member of a declare enum (i.e. ambient enum) is constant if and only if it has an initializer. Otherwise, it is computed. Note that in a declare enum, only numeric initializers are allowed. Example:
declare enum Foo {
X, // Computed
Y = 2, // Non-computed
Z, // Computed! Not 3! Careful!
Q = 1 + 1 // Error
}
Finally, members of non-declare non-const enums are always considered to be computed. However, their initializing expressions are reduced down to constants if they're computable at compile-time. This means non-const enum members are never inlined (this behavior changed in TypeScript 1.5, see "Changes in TypeScript" at the bottom)
const vs non-const
const
An enum declaration can have the const modifier. If an enum is const, all references to its members inlined.
const enum Foo { A = 4 }
var x = Foo.A; // emitted as "var x = 4;", always
const enums do not produce a lookup object when compiled. For this reason, it is an error to reference Foo in the above code except as part of a member reference. No Foo object will be present at runtime.
non-const
If an enum declaration does not have the const modifier, references to its members are inlined only if the member is non-computed. A non-const, non-declare enum will produce a lookup object.
declare (ambient) vs non-declare
An important preface is that declare in TypeScript has a very specific meaning: This object exists somewhere else. It's for describing existing objects. Using declare to define objects that don't actually exist can have bad consequences; we'll explore those later.
declare
A declare enum will not emit a lookup object. References to its members are inlined if those members are computed (see above on computed vs non-computed).
It's important to note that other forms of reference to a declare enum are allowed, e.g. this code is not a compile error but will fail at runtime:
// Note: Assume no other file has actually created a Foo var at runtime
declare enum Foo { Bar }
var s = 'Bar';
var b = Foo[s]; // Fails
This error falls under the category of "Don't lie to the compiler". If you don't have an object named Foo at runtime, don't write declare enum Foo!
A declare const enum is not different from a const enum, except in the case of --preserveConstEnums (see below).
non-declare
A non-declare enum produces a lookup object if it is not const. Inlining is described above.
--preserveConstEnums flag
This flag has exactly one effect: non-declare const enums will emit a lookup object. Inlining is not affected. This is useful for debugging.
Common Errors
The most common mistake is to use a declare enum when a regular enum or const enum would be more appropriate. A common form is this:
module MyModule {
// Claiming this enum exists with 'declare', but it doesn't...
export declare enum Lies {
Foo = 0,
Bar = 1
}
var x = Lies.Foo; // Depend on inlining
}
module SomeOtherCode {
// x ends up as 'undefined' at runtime
import x = MyModule.Lies;
// Try to use lookup object, which ought to exist
// runtime error, canot read property 0 of undefined
console.log(x[x.Foo]);
}
Remember the golden rule: Never declare things that don't actually exist. Use const enum if you always want inlining, or enum if you want the lookup object.
Changes in TypeScript
Between TypeScript 1.4 and 1.5, there was a change in the behavior (see https://github.com/Microsoft/TypeScript/issues/2183) to make all members of non-declare non-const enums be treated as computed, even if they're explicitly initialized with a literal. This "unsplit the baby", so to speak, making the inlining behavior more predictable and more cleanly separating the concept of const enum from regular enum. Prior to this change, non-computed members of non-const enums were inlined more aggressively.
There are a few things going on here. Let's go case by case.
enum
enum Cheese { Brie, Cheddar }
First, a plain old enum. When compiled to JavaScript, this will emit a lookup table.
The lookup table looks like this:
var Cheese;
(function (Cheese) {
Cheese[Cheese["Brie"] = 0] = "Brie";
Cheese[Cheese["Cheddar"] = 1] = "Cheddar";
})(Cheese || (Cheese = {}));
Then when you have Cheese.Brie in TypeScript, it emits Cheese.Brie in JavaScript which evaluates to 0. Cheese[0] emits Cheese[0] and actually evaluates to "Brie".
const enum
const enum Bread { Rye, Wheat }
No code is actually emitted for this! Its values are inlined. The following emit the value 0 itself in JavaScript:
Bread.Rye
Bread['Rye']
const enums' inlining might be useful for performance reasons.
But what about Bread[0]? This will error out at runtime and your compiler should catch it. There's no lookup table and the compiler doesn't inline here.
Note that in the above case, the --preserveConstEnums flag will cause Bread to emit a lookup table. Its values will still be inlined though.
declare enum
As with other uses of declare, declare emits no code and expects you to have defined the actual code elsewhere. This emits no lookup table:
declare enum Wine { Red, Wine }
Wine.Red emits Wine.Red in JavaScript, but there won't be any Wine lookup table to reference so it's an error unless you've defined it elsewhere.
declare const enum
This emits no lookup table:
declare const enum Fruit { Apple, Pear }
But it does inline! Fruit.Apple emits 0. But again Fruit[0] will error out at runtime because it's not inlined and there's no lookup table.
I've written this up in this playground. I recommend playing there to understand which TypeScript emits which JavaScript.

Lambda expression in c++, OS X's clang vs GCC

A particular property of c++'s lambda expressions is to capture the variables in the scope in which they are declared. For example I can use a declared and initialized variable c in a lambda function even if 'c' is not sent as an argument, but it's captured by '[ ]':
#include<iostream>
int main ()
{int c=5; [c](int d){std::cout<<c+d<<'\n';}(5);}
The expected output is thus 10. The problem arises when at least 2 variables, one captured and the other sent as an argument, have the same name:
#include<iostream>
int main ()
{int c=5; [c](int c){std::cout<<c<<'\n';}(3);}
I think that the 2011 standard for c++ says that the captured variable has the precedence on the arguments of the lambda expression in case of coincidence of names. In fact compiling the code using GCC 4.8.1 on Linux the output I get is the expected one, 5. If I compile the same code using apple's version of clang compiler (clang-503.0.40, the one which comes with Xcode 5.1.1 on Mac OS X 10.9.4) I get the other answer, 3.
I'm trying to figure why this happens; is it just an apple's compiler bug (if the standard for the language really says that the captured 'c' has the precedence) or something similar? Can this issue be fixed?
EDIT
My teacher sent an email to GCC help desk, and they answered that it's clearly a bug of GCC compiler and to report it to Bugzilla. So Clang's behavior is the correct one!
From my understanding of the c++11 standard's points below:
5.1.2 Lambda expressions
3 The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called
the closure type — whose properties are described below.
...
5 The closure type for a lambda-expression has a public inline function call operator (13.5.4) whose parameters and return type are
described by the lambda-expression’s parameter-declaration-clause and
trailing-return-type respectively. This function call operator is
declared const (9.3.1) if and only if the lambda-expression’s
parameter-declaration-clause is not followed by mutable.
...
14 For each entity captured by copy, an unnamed non static data member is declared in the closure type
A lambda expression like this...
int c = 5;
[c](int c){ std::cout << c << '\n'; }
...is roughly equivalent to a class/struct like this:
struct lambda
{
int c; // captured c
void operator()(int c) const
{
std::cout << c << '\n';
}
};
So I would expect the parameter to hide the captured member.
EDIT:
In point 14 from the standard (quoted above) it would seem the data member created from the captured variable is * unnamed *. The mechanism by which is it referenced appears to be independent of the normal identifier lookups:
17 Every id-expression that is an odr-use (3.2) of an entity captured by copy is transformed into an access to the corresponding unnamed data member of the closure type.
It is unclear from my reading of the standard if this transformation should take precedence over parameter symbol lookup.
So perhaps this should be marked as UB (undefined behaviour)?
From the C++11 Standard, 5.1.2 "Lambda expressions" [expr.prim.lambda] #7:
The lambda-expression’s compound-statement yields the function-body (8.4) of the function call operator,
but for purposes of name lookup (3.4), determining the type and value of this (9.3.2) and transforming id-expressions
referring to non-static class members into class member access expressions using (*this) (9.3.1),
the compound-statement is considered in the context of the lambda-expression.
Also, from 3.3.3 "Block scope" [basic.scope.local] #2:
The potential scope of a function parameter name (including one appearing in a lambda-declarator) or of
a function-local predefined variable in a function definition (8.4) begins at its point of declaration.
Names in a capture list are not declarations and therefore do not affect name lookup. The capture list just allows you to use the local variables; it does not introduce their names into the lambda's scope. Example:
int i, j;
int main()
{
int i = 0;
[](){ i; }; // Error: Odr-uses non-static local variable without capturing it
[](){ j; }; // OK
}
So, since the parameters to a lambda are in an inner block scope, and since name lookup is done in the context of the lambda expression (not, say, the generated class), the parameter names indeed hide the variable names in the enclosing function.

Resources