Some `Fixnum` properties - ruby

I found the following features of Fixnum in doc.
Fixnum objects have immediate value. This means that when they are assigned or passed as parameters, the actual object is passed, rather than a reference to that object.
Can the same be shown in IRB? Hope then only it will be correctly understood by me?
Assignment does not alias Fixnum objects.
What does it actually then?
There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum.
Couldn't understand the reason not to add the singleton method with Fixnum object instances.
I gave some try to the second point as below:
a = 1 # => 1
a.class # => Fixnum
b = a # => 1
b.class # => Fixnum
a == b # => true
a.equal? b # => true
a.object_id == b.object_id # => true
But still I am in confusion. Can anyone help me here to understand the core of those features please?

In Ruby, most objects require memory to store their class and instance variables. Once this memory is allocated, Ruby represents each object by this memory location. When the object is assigned to a variable or passed to a function, it is the location of this memory that is passed, not the data at this memory. Singleton methods make use of this. When you define a singleton method, Ruby silently replaces the objects class with a new singleton class. Because each object stores its class, Ruby can easily replace an object's class with a new class that implements the singleton methods (and inherits from the original class).
This is no longer true for objects that are immediate values: true, false, nil, all symbols, and integers that are small enough to fit within a Fixnum. Ruby does not allocate memory for instances of these objects, it does not internally represent the objects as a location in memory. Instead, it infers the instance of the object based on its internal representation. What this means is twofold:
The class of each object is no longer stored in memory at a particular location, and is instead implicitly determined by the type of immediate object. This is why Fixnums cannot have singleton methods.
Immediate objects with the same state (e.g., two Fixnums of integer 2378) are actually the same instance. This is because the instance is determined by this state.
To get a better sense of this, consider the following operations on a Fixnum:
>> x = 3 + 7
=> 10
>> x.object_id == 10.object_id
=> true
>> x.object_id == (15-5).object_id
=> true
Now, consider them using strings:
>> x = "a" + "b"
=> "ab"
>> x.object_id == "ab".object_id
=> false
>> x.object_id == "Xab"[1...3].object_id
=> false
>> x == "ab"
=> true
>> x == "Xab"[1...3]
=> true
The reason the object ids of the Fixnums are equal is that they're immediate objects with the same internal representation. The strings, on the other hand, exist in allocated memory. The object id of each string is the location of its object state in memory.
Some low-level information
To understand this, you have to understand how Ruby (at least 1.8 and 1.9) treat Fixnums internally. In Ruby, all objects are represented in C code by variables of type VALUE. Ruby imposes the following requirements for VALUE:
The type VALUE is is the smallest integer of sufficient size to hold a pointer. This means, in C, that sizeof(VALUE) == sizeof(void*).
Any non-immediate object must be aligned on a 4-byte boundary. This means that any object allocated by Ruby will have address 4*i for some integer i. This also means that all pointers have zero values in their two least significant bits.
The first requirement allows Ruby to store both pointers to objects and immediate values in a variable of type VALUE. The second requirement allows Ruby to detect Fixnum and Symbol objects based on the two least significant bits.
To make this more concrete, consider the internal binary representation of a Ruby object z, which we'll call Rz in a 32-bit architecture:
MSB LSB
3 2 1
1098 7654 3210 9876 5432 1098 7654 32 10
XXXX XXXX XXXX XXXX XXXX XXXX XXXX AB CD
Ruby then interprets Rz, the representation of z, as follows:
If D==1, then z is a Fixnum. The integer value of this Fixnum is stored in the upper 31 bits of the representation, and is recovered by performing an arithmetic right shift to recover the signed integer stored in these bits.
Three special representations are tested (all with D==0)
if Rz==0, then z is false
if Rz==2, then z is true
if Rz==4, then z is nil
If ABCD == 1110, then 'z' is a Symbol. The symbol is converted into a unique ID by right-shifting the eight least-significant bits (i.e., z>>8 in C). On 32-bit architectures, this allows 2^24 different IDs (over 10 million). On 64-bit architectures, this allows 2^48 different IDs.
Otherwise, Rz represents an address in memory for an instance of a Ruby object, and the type of z is determined by the class information at that location.

Well...
This is an internal implementation detail on MRI. You see, in Ruby (on the C side of things), all Ruby objects are a VALUE. In most cases, a VALUE is a pointer to an object living on the stack. But the immediate values (Fixnums, true, false, nil, and something else) live in the VALUE where a pointer to an object would usually live.
It makes the variable the exact same object. Now, I have no idea how this works internally, because it assigns it to the VALUE itself, but it does.
Because every time I use the number 1 in my program, I'm using the same object. So if I define a singleton method on 1, every place in my program, required programs, etc., will have that singleton method. And singleton methods are usually used for local monkeypatching. So, to prevent this, Ruby just doesn't let you.

Related

Maintaining Ruby Set by object ID

I'm developing an algorithm in Ruby with the following properties:
It works on two objects of type Set, where each element is an Array, where all elements are of type String
Each Array involved has the same number of elements
No two arrays happen to be have the same content (when comparing with ==)
The algorithm involves many operations of moving an array from one Set to the other (or back), storing references to certain Arrays, and testing whether or not that reference is part of the Array
There is no duplication of the Arrays; all Arrays keep their object ID during all the time.
A native implementation would do something like this (to give you the idea); in practice, the arrays here have longer strings and more elements:
# Set up all Arrays involved
master=[
%w(a b c d),
%w(a b c x),
%w(u v w y),
# .... and so on
]
# Create initial sets.
x=Set.new
y=Set.new
# ....
x.add(master[0])
x.add(master[2])
y.add(master[1])
# ....
# Operating on the sets.
i=1
# ...
arr=master[i]
# Move element arr from y to x, if it is in y
if(y.member?(arr)
y.delete(arr)
x.add(arr)
end
# Do something with the sets
x.each { |arr| puts arr.pretty_print }
This would indeed work, simply because the arrays are all different in content. However, testing for membership means that y.member?(arr) tests that we don't have already an object with the same array content like arrin our Set, while it would be sufficient to verify to test that we don't have already an element with the same object_id in our Set, so I'm worried about performance. From my understanding, finding the the object id of an object is cheap, and since it is just a number, maintaining a set of numbers is more performant than maintaining a set of arrays of strings.
Therefore I could try to define my two sets as sets of object_id, and membership test would be faster. However when iterating over a Set, using the object_id to find the array itself is expensive (I would have to search ObjectSpace).
Another possibility would be to not maintain the set of arrays, but the set of indexes into my master array. My code would then be, for example,
x.add(0) # instead of x.add(master[0])
and iterating over a Set would be, i.e.
x.each { |i| puts master[i].pretty_print }
I wonder whether there is a better way - for instance that we can somehow "teach" Set.new to use object identity for maintaining its members, instead of equality.
I think you’re looking for Set#compare_by_identity, which makes the set use the object’s identity (i.e. object ID) of its contents.
x = Set.new
x.compare_by_identity

Multiplying string by integer vs integer by string in ruby

I was playing around in irb, and noticed one cannot do
5 * "Hello".
Error
String can't be coerced into Fixnum
However "Hello"*5 provided "HelloHelloHelloHelloHello" as expected.
What is the exact reason for this? I've been looking around in the doc's and could not find the exact reason for this behavior. Is this something the designers of ruby decided?
Basically, you are asking "why is multiplication not commutative"? There are two possible answers for this. Or rather one answer with two layers.
The basic principle of OO is that everything happens as the result of one object sending a message to another object and that object responding to that message. This "messaging" metaphor is very important, because it explains a lot of things in OO. For example, if you send someone a message, all you can observe is what their response is. You don't know, and have no idea of finding out, what they did to come up with that response. They could have just handed out a pre-recorded response (reference an instance variable). They could have worked hard to construct a response (execute a method). They could have handed the message off to someone else (delegation). Or, they just don't understand the message you are sending them (NoMethodError).
Note that this means that the receiver of the message is in total control. The receiver can respond in any way it wishes. This makes message sending inherently non-commutative. Sending message foo to a passing b as an argument is fundamentally different from sending message foo to b passing a as an argument. In one case, it is a and only a that decides how to respond to the message, in the other case it is b and only b.
Making this commutative requires explicit cooperation between a and b. They must agree on a common protocol and adhere to that protocol.
In Ruby, binary operators are simply message sends to the left operand. So, it is solely the left operand that decides what to do.
So, in
'Hello' * 5
the message * is sent to the receiver 'Hello' with the argument 5. In fact, you can alternately write it like this if you want, which makes this fact more obvious:
'Hello'.*(5)
'Hello' gets to decide how it responds to that message.
Whereas in
5 * 'Hello'
it is 5 which gets to decide.
So, the first layer of the answer is: Message sending in OO is inherently non-commutative, there is no expectation of commutativity anyway.
But, now the question becomes, why don't we design in some commutativity? For example, one possible way would be to interpret binary operators not as message sends to one of the operands but instead message sends to some third object. E.g., we could interpret
5 * 'Hello'
as
*(5, 'Hello')
and
'Hello' * 5
as
*('Hello', 5)
i.e. as message sends to self. Now, the receiver is the same in both cases and the receiver can arrange for itself to treat the two cases identically and thus make * commutative.
Another, similar possibility would be to use some sort of shared context object, e.g. make
5 * 'Hello'
equivalent to
Operators.*(5, 'Hello')
In fact, in mathematics, the meaning of a symbol is often dependent on context, e.g. in ℤ, 2 / 3 is undefined, in ℚ, it is 2/3, and in IEEE754, it is something close to, but not exactly identical to 0.333…. Or, in ℤ, 2 * 3 is 6, but in ℤ|5, 2 * 3 is 1.
So, it would certainly make sense to do this. Alas, it isn't done.
Another possibility would be to have the two operands cooperate using a standard protocol. In fact, for arithmetic operations on Numerics, there actually is such a protocol! If a receiver doesn't know what to do with an operand, it can ask that operand to coerce itself, the receiver, or both to something the receiver does know how to handle.
Basically, the protocol goes like this:
you call 5 * 'Hello'
5 doesn't know how to handle 'Hello', so it asks 'Hello' for a coercion. …
… 5 calls 'Hello'.coerce(5)
'Hello' responds with a pair of objects [a, b] (as an Array) such that a * b has the desired result
5 calls a * b
One common trick is to simply implement coerce to flip the operands, so that when 5 retries the operation, 'Hello' will be the receiver:
class String
def coerce(other)
[self, other]
end
end
5 * 'Hello'
#=> 'HelloHelloHelloHelloHello'
Okay, OO is inherently non-commutative, but we can make it commutative using cooperation, so why isn't it done? I must admit, I don't have a clear-cut answer to this question, but I can offer two educated guesses:
coerce is specifically intended for numeric coercion in arithmetic operations. (Note the protocol is defined in Numeric.) A string is not a number, nor is string concatenation an arithmetic operation.
We just don't expect * to be commutative with wildly different types such as Integer and String.
Of course, just for fun, we can actually observe that there is a certain symmetry between Integers and Strings. In fact, you can implement a common version of Integer#* for both String and Integer arguments, and you will see that the only difference is in what we choose as the "zero" element:
class Integer
def *(other)
zero = case other
when Integer then 0
when String then ''
when Array then []
end
times.inject(zero) {|acc, _| acc + other }
end
end
5 * 6
#=> 30
5 * 'six'
#=> 'sixsixsixsixsix'
5 * [:six]
#=> [:six, :six, :six, :six, :six, :six]
The reason for this is, of course, that the set of strings with the concatenation operation and the empty string as the identity element form a monoid, just like arrays with concatenation and the empty array and just like integers with addition and zero. Since all three are monoids, and our "multiplication as repeated addition" only requires monoid operations and laws, it will work for all monoids.
Note: Python has an interesting twist on this double-dispatch idea. Just like in Ruby, if you write
a * b
Python will re-write that into a message send:
a.__mul__(b)
However, if a can't handle the operation, instead of cooperating with b, it cooperates with Python by returning NotImplemented. Now, Python will try with b, but with a slight twist: it will call
b.__rmul__(a)
This allows b to know that it was on the right side of the operator. It doesn't matter much for multiplication (because multiplication is (usually but not always, see e.g. matrix multiplication) commutative), but remember that operator symbols are distinct from their operations. So, the same operator symbol can be used for operations that are commutative and ones that are non-commutative. Example: + is used in Ruby for addition (2 + 3 == 3 + 2) and also for concatenation ('Hello' + 'World' != 'World' + 'Hello'). So, it is actually advantageous for an object to know whether it was the right or left operand.
This is because that operators are also methods(Well there are exceptions as Cary has listed in the comments which I wasn't aware of).
For example
array << 4 == array.<<4
array[2] == array.[](2)
array[2] ='x' == array.[] =(2,'x')
In your example:
5 * "Hello" => 5.*("Hello")
Meanwhile
"hello" *5 => 5.*("hello")
An integer cannot take that method with a string param
If you ever dabble around in python try 5*hello and hello*5, both work. Pretty interesting that ruby has this feature to be honest.
Well, as Muntasir Alam has already told that Fixnum does not has a method named * which takes a string as argument. So, 5*"Hello" produces that error.But, to have fun we can actually achieve 5*"Hello" this by adding that missing method to the Fixnum class.
class Fixnum # open the class
def * str # Override the *() method
if str.is_a? String # If argument is String
temp = ""
self.times do
temp << str
end
temp
else # If the argument is not String
mul = 0
self.times do
mul += str
end
mul
end
end
end
now
puts 5*"Hello" #=> HelloHelloHelloHelloHello
puts 4*5 #=> 20
puts 5*10.4 #=> 52.0
Well, that was just to show that the opposite is also possible. But that will bring a lot of overhead. I think we should avoid that at all cost.

How to unfreeze an object in Ruby?

In Ruby, there is Object#freeze, which prevents further modifications to the object:
class Kingdom
attr_accessor :weather_conditions
end
arendelle = Kingdom.new
arendelle.frozen? # => false
arendelle.weather_conditions = 'in deep, deep, deep, deep snow'
arendelle.freeze
arendelle.frozen? # => true
arendelle.weather_conditions = 'sun is shining'
# !> RuntimeError: can't modify frozen Kingdom
script = 'Do you want to build a snowman?'.freeze
script[/snowman/] = 'castle of ice'
# !> RuntimeError: can't modify frozen String
However, there is no Object#unfreeze. Is there a way to unfreeze a frozen kingdom?
Update: As of Ruby 2.7 this no longer works!
Yes and no. There isn't any direct way using the standard API. However, with some understanding of what #freeze? does, you can work around it. Note: everything here is implementation details of MRI's current version and might be subject to change.
Objects in CRuby are stored in a struct RVALUE.
Conveniently, the very first thing in the struct is VALUE flags;.
All Object#freeze does is set a flag, called FL_FREEZE, which is actually equal to RUBY_FL_FREEZE. RUBY_FL_FREEZE will basically be the 11th bit in the flags.
All you have to do to unfreeze the object is unset the 11th bit.
To do that, you could use Fiddle, which is part of the standard library and lets you tinker with the language on C level:
require 'fiddle'
class Object
def unfreeze
Fiddle::Pointer.new(object_id * 2)[1] &= ~(1 << 3)
end
end
Non-immediate value objects in Ruby are stored on address = their object_id * 2. Note that it's important to make the distinction so you would be aware that this wont let you unfreeze Fixnums for example.
Since we want to change the 11th bit, we have to work with the 3th bit of the second byte. Hence we access the second byte with [1].
~(1 << 3) shifts 1 three positions and then inverts the result. This way the only bit which is zero in the mask will be the third one and all other will be ones.
Finally, we just apply the mask with bitwise and (&=).
foo = 'A frozen string'.freeze
foo.frozen? # => true
foo.unfreeze
foo.frozen? # => false
foo[/ (?=frozen)/] = 'n un'
foo # => 'An unfrozen string'
No, according to the documentation for Object#freeze:
There is no way to unfreeze a frozen object.
The frozen state is stored within the object. Calling freeze sets the frozen state and thereby prevents further modification. This includes modifications to the object's frozen state.
Regarding your example, you could assign a new string instead:
script = 'Do you want to build a snowman?'
script.freeze
script = script.dup if script.frozen?
script[/snowman/] = 'castle of ice'
script #=> "Do you want to build a castle of ice?"
Ruby 2.3 introduced String#+#, so you can write +str instead of str.dup if str.frozen?
frozen_object = %w[hello world].freeze
frozen_object.concat(['and universe']) # FrozenError (can't modify frozen Array)
frozen_object.dup.concat(['and universe']) # ['hello', 'world', 'and universe']
As noted above copying the variable back into itself also effectively unfreezes the variable.
As noted this can be done using the .dup method:
var1 = var1.dup
This can also be achieved using:
var1 = Marshal.load(Marshal.dump(var1))
I have been using Marshal.load(Marshal.dump( ... )
I have not used .dup and only learned about it through this post.
I do not know what if any differences there are between Marshal.load(Marshal.dump( ... )
If they do the same thing or .dup is more powerful, then stylistically I like .dup better. .dup states what to do -- copy this thing, but it does not say how to do it, whereas Marshal.load(Marshal.dump( ... ) is not only excessively verbose, but states how to do the duplication -- I am not a fan of specifying the HOW part if the HOW part is irrelevant to me. I want to duplicate the value of the variable, I do not care how.

Size of class in bytes

Is there a method to see the size of allocated memory for a class in ruby?
I have built a custom class and I would like to know its size in memory. So is there a function with the likeness of sizeof() in C?
I am simply trying to initialize a new class like so
test = MyClass.new
and trying to find a method to print out the size of the class that has been allocated to memory.
Is this even possible in ruby?
There is no language feature that calculates the size of a class in the same way as C.
The memory size of an object is implementation dependent. It depends on the implementation of the base class object. It is also not simple to estimate the memory used. For example, strings can be embedded in an RString structure if they are short, but stored in the heap if they are long (Never create Ruby strings longer than 23 characters).
The memory taken by some objects has been tabulated for different ruby implementations: Memory footprint of objects in Ruby 1.8, EE, 1.9, and OCaml
Finally, the object size may differ even with two objects from the same class, since it is possible to arbitrarily add extra instance variables, without hardcoding what instance variables are present. For example, see instance_variable_get and instance_variable_set
If you use MRI ruby 1.9.2+, there is a method you can try (be warned that it is looking at only part of the object, this is obvious from the fact that integers and strings appear to have zero size):
irb(main):176:0> require 'objspace'
=> true
irb(main):176:0> ObjectSpace.memsize_of(134)
=> 0
irb(main):177:0> ObjectSpace.memsize_of("asdf")
=> 0
irb(main):178:0> ObjectSpace.memsize_of({a: 4})
=> 184
irb(main):179:0> ObjectSpace.memsize_of({a: 4, b: 5})
=> 232
irb(main):180:0> ObjectSpace.memsize_of(/a/.match("a"))
=> 80
You can also try memsize_of_all (note that it looks at the memory usage of the whole interpreter, and overwriting a variable does not appear to delete the old copy immediately):
irb(main):190:0> ObjectSpace.memsize_of_all
=> 4190347
irb(main):191:0> asdf = 4
=> 4
irb(main):192:0> ObjectSpace.memsize_of_all
=> 4201350
irb(main):193:0> asdf = 4
=> 4
irb(main):194:0> ObjectSpace.memsize_of_all
=> 4212353
irb(main):195:0> asdf = 4.5
=> 4.5
irb(main):196:0> ObjectSpace.memsize_of_all
=> 4223596
irb(main):197:0> asdf = "a"
=> "a"
irb(main):198:0> ObjectSpace.memsize_of_all
=> 4234879
You should be very careful because there is no guarantee when the Ruby interpreter will perform garbage collection. While you might use this for testing and experimentation, it is recommended that this is NOT used in production!

In Ruby, why does inspect() print out some kind of object id which is different from what object_id() gives?

When the p function is used to print out an object, it may give an ID, and it is different from what object_id() gives. What is the reason for the different numbers?
Update: 0x4684abc is different from 36971870, which is 0x234255E
>> a = Point.new
=> #<Point:0x4684abc>
>> a.object_id
=> 36971870
>> a.__id__
=> 36971870
>> "%X" % a.object_id
=> "234255E"
The default implementation of inspect calls the default implementation of to_s, which just shows the hexadecimal value of the object directly, as seen in the Object#to_s docs (click on the method description to reveal the source).
Meanwhile the comments in the C source underlying the implementation of object_id shows that there are different “namespaces” for Ruby values and object ids, depending on the type of the object (e.g. the lowest bit seems to be zero for all but Fixnums). You can see that in Object#object_id docs (click to reveal the source).
From there we can see that in the “object id space” (returned by object_id) the ids of objects start from the second bit on the right (with the first bit being zero), but in “value space” (used by inspect) they start from the third bit on the right (with the first two bits zero). So, to convert the values from the “object id space” to the “value space”, we can shift the object_id to the left by one bit and get the same result that is shown by inspect:
> '%x' % (36971870 << 1)
=> "4684abc"
> a = Foo.new
=> #<Foo:0x5cfe4>
> '%x' % (a.object_id << 1)
=> "5cfe4"
0x234255E
=>36971870
It's not different, it's the hexadecimal representation of the memory address:-)

Resources