Size of class in bytes - ruby

Is there a method to see the size of allocated memory for a class in ruby?
I have built a custom class and I would like to know its size in memory. So is there a function with the likeness of sizeof() in C?
I am simply trying to initialize a new class like so
test = MyClass.new
and trying to find a method to print out the size of the class that has been allocated to memory.
Is this even possible in ruby?

There is no language feature that calculates the size of a class in the same way as C.
The memory size of an object is implementation dependent. It depends on the implementation of the base class object. It is also not simple to estimate the memory used. For example, strings can be embedded in an RString structure if they are short, but stored in the heap if they are long (Never create Ruby strings longer than 23 characters).
The memory taken by some objects has been tabulated for different ruby implementations: Memory footprint of objects in Ruby 1.8, EE, 1.9, and OCaml
Finally, the object size may differ even with two objects from the same class, since it is possible to arbitrarily add extra instance variables, without hardcoding what instance variables are present. For example, see instance_variable_get and instance_variable_set
If you use MRI ruby 1.9.2+, there is a method you can try (be warned that it is looking at only part of the object, this is obvious from the fact that integers and strings appear to have zero size):
irb(main):176:0> require 'objspace'
=> true
irb(main):176:0> ObjectSpace.memsize_of(134)
=> 0
irb(main):177:0> ObjectSpace.memsize_of("asdf")
=> 0
irb(main):178:0> ObjectSpace.memsize_of({a: 4})
=> 184
irb(main):179:0> ObjectSpace.memsize_of({a: 4, b: 5})
=> 232
irb(main):180:0> ObjectSpace.memsize_of(/a/.match("a"))
=> 80
You can also try memsize_of_all (note that it looks at the memory usage of the whole interpreter, and overwriting a variable does not appear to delete the old copy immediately):
irb(main):190:0> ObjectSpace.memsize_of_all
=> 4190347
irb(main):191:0> asdf = 4
=> 4
irb(main):192:0> ObjectSpace.memsize_of_all
=> 4201350
irb(main):193:0> asdf = 4
=> 4
irb(main):194:0> ObjectSpace.memsize_of_all
=> 4212353
irb(main):195:0> asdf = 4.5
=> 4.5
irb(main):196:0> ObjectSpace.memsize_of_all
=> 4223596
irb(main):197:0> asdf = "a"
=> "a"
irb(main):198:0> ObjectSpace.memsize_of_all
=> 4234879
You should be very careful because there is no guarantee when the Ruby interpreter will perform garbage collection. While you might use this for testing and experimentation, it is recommended that this is NOT used in production!

Related

Can you mutate value of Concurrent::ThreadLocalVar?

I'm using Concurrent::ThreadLocalVar from concurrent-ruby.
Is it okay to mutate the value like this:
numbers = Concurrent::ThreadLocalVar.new([])
numbers.value # => []
numbers.value.append(1)
numbers.value # => [1]
Or should I reassign the value? Like this:
numbers = Concurrent::ThreadLocalVar.new([])
numbers.value # => []
numbers.value = numbers.value.append(1)
numbers.value # => [1]
You can mutate the array in place, as long as you make sure to not "move" a reference to it across thread-boundaries. That is, as long as you access the value only from a single thread (i.e. the same thread with which you got the value) things will be fine.
If you need to access the array from multiple threads concurrently, you likely should use a different abstraction, such as explicit locking with a Thread::Mutex or by using a Concurrent::Array.
In any case, you must always be careful when accessing objects within different threads to ensure concurrent access is safe. If in doubt, try to avoid moving mutable (i.e. non-frozen) objects across thread-boundaries.

How to unfreeze an object in Ruby?

In Ruby, there is Object#freeze, which prevents further modifications to the object:
class Kingdom
attr_accessor :weather_conditions
end
arendelle = Kingdom.new
arendelle.frozen? # => false
arendelle.weather_conditions = 'in deep, deep, deep, deep snow'
arendelle.freeze
arendelle.frozen? # => true
arendelle.weather_conditions = 'sun is shining'
# !> RuntimeError: can't modify frozen Kingdom
script = 'Do you want to build a snowman?'.freeze
script[/snowman/] = 'castle of ice'
# !> RuntimeError: can't modify frozen String
However, there is no Object#unfreeze. Is there a way to unfreeze a frozen kingdom?
Update: As of Ruby 2.7 this no longer works!
Yes and no. There isn't any direct way using the standard API. However, with some understanding of what #freeze? does, you can work around it. Note: everything here is implementation details of MRI's current version and might be subject to change.
Objects in CRuby are stored in a struct RVALUE.
Conveniently, the very first thing in the struct is VALUE flags;.
All Object#freeze does is set a flag, called FL_FREEZE, which is actually equal to RUBY_FL_FREEZE. RUBY_FL_FREEZE will basically be the 11th bit in the flags.
All you have to do to unfreeze the object is unset the 11th bit.
To do that, you could use Fiddle, which is part of the standard library and lets you tinker with the language on C level:
require 'fiddle'
class Object
def unfreeze
Fiddle::Pointer.new(object_id * 2)[1] &= ~(1 << 3)
end
end
Non-immediate value objects in Ruby are stored on address = their object_id * 2. Note that it's important to make the distinction so you would be aware that this wont let you unfreeze Fixnums for example.
Since we want to change the 11th bit, we have to work with the 3th bit of the second byte. Hence we access the second byte with [1].
~(1 << 3) shifts 1 three positions and then inverts the result. This way the only bit which is zero in the mask will be the third one and all other will be ones.
Finally, we just apply the mask with bitwise and (&=).
foo = 'A frozen string'.freeze
foo.frozen? # => true
foo.unfreeze
foo.frozen? # => false
foo[/ (?=frozen)/] = 'n un'
foo # => 'An unfrozen string'
No, according to the documentation for Object#freeze:
There is no way to unfreeze a frozen object.
The frozen state is stored within the object. Calling freeze sets the frozen state and thereby prevents further modification. This includes modifications to the object's frozen state.
Regarding your example, you could assign a new string instead:
script = 'Do you want to build a snowman?'
script.freeze
script = script.dup if script.frozen?
script[/snowman/] = 'castle of ice'
script #=> "Do you want to build a castle of ice?"
Ruby 2.3 introduced String#+#, so you can write +str instead of str.dup if str.frozen?
frozen_object = %w[hello world].freeze
frozen_object.concat(['and universe']) # FrozenError (can't modify frozen Array)
frozen_object.dup.concat(['and universe']) # ['hello', 'world', 'and universe']
As noted above copying the variable back into itself also effectively unfreezes the variable.
As noted this can be done using the .dup method:
var1 = var1.dup
This can also be achieved using:
var1 = Marshal.load(Marshal.dump(var1))
I have been using Marshal.load(Marshal.dump( ... )
I have not used .dup and only learned about it through this post.
I do not know what if any differences there are between Marshal.load(Marshal.dump( ... )
If they do the same thing or .dup is more powerful, then stylistically I like .dup better. .dup states what to do -- copy this thing, but it does not say how to do it, whereas Marshal.load(Marshal.dump( ... ) is not only excessively verbose, but states how to do the duplication -- I am not a fan of specifying the HOW part if the HOW part is irrelevant to me. I want to duplicate the value of the variable, I do not care how.

Some `Fixnum` properties

I found the following features of Fixnum in doc.
Fixnum objects have immediate value. This means that when they are assigned or passed as parameters, the actual object is passed, rather than a reference to that object.
Can the same be shown in IRB? Hope then only it will be correctly understood by me?
Assignment does not alias Fixnum objects.
What does it actually then?
There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum.
Couldn't understand the reason not to add the singleton method with Fixnum object instances.
I gave some try to the second point as below:
a = 1 # => 1
a.class # => Fixnum
b = a # => 1
b.class # => Fixnum
a == b # => true
a.equal? b # => true
a.object_id == b.object_id # => true
But still I am in confusion. Can anyone help me here to understand the core of those features please?
In Ruby, most objects require memory to store their class and instance variables. Once this memory is allocated, Ruby represents each object by this memory location. When the object is assigned to a variable or passed to a function, it is the location of this memory that is passed, not the data at this memory. Singleton methods make use of this. When you define a singleton method, Ruby silently replaces the objects class with a new singleton class. Because each object stores its class, Ruby can easily replace an object's class with a new class that implements the singleton methods (and inherits from the original class).
This is no longer true for objects that are immediate values: true, false, nil, all symbols, and integers that are small enough to fit within a Fixnum. Ruby does not allocate memory for instances of these objects, it does not internally represent the objects as a location in memory. Instead, it infers the instance of the object based on its internal representation. What this means is twofold:
The class of each object is no longer stored in memory at a particular location, and is instead implicitly determined by the type of immediate object. This is why Fixnums cannot have singleton methods.
Immediate objects with the same state (e.g., two Fixnums of integer 2378) are actually the same instance. This is because the instance is determined by this state.
To get a better sense of this, consider the following operations on a Fixnum:
>> x = 3 + 7
=> 10
>> x.object_id == 10.object_id
=> true
>> x.object_id == (15-5).object_id
=> true
Now, consider them using strings:
>> x = "a" + "b"
=> "ab"
>> x.object_id == "ab".object_id
=> false
>> x.object_id == "Xab"[1...3].object_id
=> false
>> x == "ab"
=> true
>> x == "Xab"[1...3]
=> true
The reason the object ids of the Fixnums are equal is that they're immediate objects with the same internal representation. The strings, on the other hand, exist in allocated memory. The object id of each string is the location of its object state in memory.
Some low-level information
To understand this, you have to understand how Ruby (at least 1.8 and 1.9) treat Fixnums internally. In Ruby, all objects are represented in C code by variables of type VALUE. Ruby imposes the following requirements for VALUE:
The type VALUE is is the smallest integer of sufficient size to hold a pointer. This means, in C, that sizeof(VALUE) == sizeof(void*).
Any non-immediate object must be aligned on a 4-byte boundary. This means that any object allocated by Ruby will have address 4*i for some integer i. This also means that all pointers have zero values in their two least significant bits.
The first requirement allows Ruby to store both pointers to objects and immediate values in a variable of type VALUE. The second requirement allows Ruby to detect Fixnum and Symbol objects based on the two least significant bits.
To make this more concrete, consider the internal binary representation of a Ruby object z, which we'll call Rz in a 32-bit architecture:
MSB LSB
3 2 1
1098 7654 3210 9876 5432 1098 7654 32 10
XXXX XXXX XXXX XXXX XXXX XXXX XXXX AB CD
Ruby then interprets Rz, the representation of z, as follows:
If D==1, then z is a Fixnum. The integer value of this Fixnum is stored in the upper 31 bits of the representation, and is recovered by performing an arithmetic right shift to recover the signed integer stored in these bits.
Three special representations are tested (all with D==0)
if Rz==0, then z is false
if Rz==2, then z is true
if Rz==4, then z is nil
If ABCD == 1110, then 'z' is a Symbol. The symbol is converted into a unique ID by right-shifting the eight least-significant bits (i.e., z>>8 in C). On 32-bit architectures, this allows 2^24 different IDs (over 10 million). On 64-bit architectures, this allows 2^48 different IDs.
Otherwise, Rz represents an address in memory for an instance of a Ruby object, and the type of z is determined by the class information at that location.
Well...
This is an internal implementation detail on MRI. You see, in Ruby (on the C side of things), all Ruby objects are a VALUE. In most cases, a VALUE is a pointer to an object living on the stack. But the immediate values (Fixnums, true, false, nil, and something else) live in the VALUE where a pointer to an object would usually live.
It makes the variable the exact same object. Now, I have no idea how this works internally, because it assigns it to the VALUE itself, but it does.
Because every time I use the number 1 in my program, I'm using the same object. So if I define a singleton method on 1, every place in my program, required programs, etc., will have that singleton method. And singleton methods are usually used for local monkeypatching. So, to prevent this, Ruby just doesn't let you.

Can Ruby handle large hash objects

I want to compare two hashes. Each can have more than 20,000 objects.
I have the following questions:
Can Ruby handle such a large amount of objects?
Will comparing these two hashes take a lot of time?
Can indexing be applied to reduce enumerations?
The hashes themselves are fast and not bound to low limits. E.g. this does not even take a millisecond here (Ruby 1.9.2 on Windows):
irb(main):008:0> hash1 = (0...20000).inject({}) { | r, i | r[rand(100)*100000 + i] = rand; r } ; 23
=> 23
irb(main):009:0> hash2 = (0...20000).inject({}) { | r, i | r[rand(100)*100000 + i] = rand; r } ; 23
=> 23
irb(main):010:0> hash3 = hash1.dup ; 23
=> 23
irb(main):011:0> hash1 == hash2
=> false
irb(main):012:0> hash1 == hash3
=> true
Everything else depends on what you are stuffing into the hashes.
Rails is a framework and has little to do with object comparison. Ruby is certainly capable of comparing 20,000 objects, assuming they fit well into memory or you are comparing them in a batch process that limits how many are instantiated at any time.
If you are are talking about comparing 20,000 ActiveRecord objects in-memory you will probably run out of memory and may experience pretty slow results even if you don't. ActiveRecord is pretty heavy-weight and not the best tool for working with large numbers of objects. However, I don't know what these 20,000 objects are or exactly how you are comparing them, so maybe they don't have to all be in-memory simultaneously and a batch process could complete this in a time frame you find acceptable.
If these are simple objects in a simple ruby hash you can certainly iterate through them pretty quickly (though what is fast is completely dependent on what this is for). If your comparison logic is pretty simple it shouldn't be too time consuming, assuming each object in your first hash is compared to a single corresponding object in the second hash. If every object in hash 1 is compared to each of the 20,000 in hash 2 then your total comparisons (20,000 * 20,0000) are much larger and this might not be as fast as you need.

In Ruby, why does inspect() print out some kind of object id which is different from what object_id() gives?

When the p function is used to print out an object, it may give an ID, and it is different from what object_id() gives. What is the reason for the different numbers?
Update: 0x4684abc is different from 36971870, which is 0x234255E
>> a = Point.new
=> #<Point:0x4684abc>
>> a.object_id
=> 36971870
>> a.__id__
=> 36971870
>> "%X" % a.object_id
=> "234255E"
The default implementation of inspect calls the default implementation of to_s, which just shows the hexadecimal value of the object directly, as seen in the Object#to_s docs (click on the method description to reveal the source).
Meanwhile the comments in the C source underlying the implementation of object_id shows that there are different “namespaces” for Ruby values and object ids, depending on the type of the object (e.g. the lowest bit seems to be zero for all but Fixnums). You can see that in Object#object_id docs (click to reveal the source).
From there we can see that in the “object id space” (returned by object_id) the ids of objects start from the second bit on the right (with the first bit being zero), but in “value space” (used by inspect) they start from the third bit on the right (with the first two bits zero). So, to convert the values from the “object id space” to the “value space”, we can shift the object_id to the left by one bit and get the same result that is shown by inspect:
> '%x' % (36971870 << 1)
=> "4684abc"
> a = Foo.new
=> #<Foo:0x5cfe4>
> '%x' % (a.object_id << 1)
=> "5cfe4"
0x234255E
=>36971870
It's not different, it's the hexadecimal representation of the memory address:-)

Resources