What is the size of a boolean in Ruby? - ruby

What is the size of a boolean data type in Ruby? There was a long discussion on Ruby Forum regarding this, but there was no final answer that I could get from it.
Also, how can I find what size it is.
For example if I stored it in an array, how much memory would it take
a=[true, true]
vs
a=[1,1]

Serializing tells us that
Marshal.dump([true,true]).length # => 6
Marshal.dump(true).length # => 3
Marshal.dump([1,1]).length # => 8
Marshal.dump(1).length # => 4
I'm pretty sure that this values does not represent real memory usage, but [true,true] seems to be more effective than [1,1].

The Ruby Language Specification does not specify any particular representation for any object. Every Ruby Implementation is free to choose any representation it wants.
Note also that not being able to tell the representation of an object is the defining characteristic of Object-Oriented Data Abstraction. If it were possible to tell the size of a Boolean, Ruby wouldn't be object-oriented!

Related

Ruby object to_s what is the encoding of the object id?

In Ruby, the to_s on an object includes an encoding of the object's id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
In the documentation it says
Returns a string representing obj. The default to_s prints the object’s class and an encoding of the object id.
https://apidock.com/ruby/Object/to_s
In the example above, the encoding of the object id is 0x00007fac5eb6afc8.
In How does object_id assignment work? they explain
In MRI the object_id of an object is the same as the VALUE that represents the object on the C level.
So I compared to the object_id and it is not the same as the encoding of the object id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
[3] pry(main)> shape.object_id
=> 70189150066660
What exactly is the encoding of the object id? It does not appear to be the object_id.
Think of the object_id, or __id__ as the "pointer" for the object. It is not technically a pointer, but does contain a unique value that can be used to retrieve the internal C VALUE.
There are patterns to the value it has for some data types, as you can see with its hexadecimal representation with to_s. I am will not go into all the details, as there are already numerous answers on SO explaining, and already linked from comments, but integers (up to a FIXNUM_MAX, have predictable values, and special constants like true, false, and nil will always have the same object_id in every run.
To put simply, it is nothing more than a number, and shown as a hexadecimal (base 16) value, not any actual "encoding" or cypher.
Going to expand upon this a bit more in light of your latest edits to the question. As you posted, the hexadecimal number you see in to_s is the value of the internal C VALUE of the object. VALUE is a C data type (unsigned, pointer size number) that every Ruby object is represented as in C code. As #Stefan pointed out in a comment, for non-integer types (I speak only for MRI version), it is twice the value of the object_id. Not that you probably care, but you can shift the bits of an integer to predict the value for those.
Therefore, using you example.
A value of 0x00007fac5eb6afc8 is simple hexadecimal notation for a number. It uses a base 16 counting system as opposed to the base 10 decimal system we are more used to in everyday life. It is simply a different way of looking at the same number.
So, using that logic.
a = 0x00007fac5eb6afc8
#=> 140378300133320 # Decimal representation
a /= 2 # Remember, non-integers are half of this value
#=> 70189150066660 # Your object_id
The best answer you can get is: You don't know, and you shouldn't need to.
Ruby guarantees exactly three things about object IDs:
An object has the same ID during its lifetime.
No two objects have the same ID at the same time.
IDs are integers.
In particular, this means that you cannot rely on a specific object having a specific ID (for example, nil having ID 8). It also means that IDs can be re-used. You should think of it as nothing but opaque identifier.
And, as you quoted, the default Object#to_s uses "some" encoding of the ID.
And that is all you know, and all you should ever rely on. In particular, you should never try to parse IDs or Object#to_s.
So, the ID part of Object#to_s is "some unspecified encoding" of the ID, which itself is "some opaque identifier".
Everything else is deliberately left unspecified, so that different implementations can make different choices that make sense for their specific needs. For example, it would be stupid to tie object IDs to memory addresses, because implementations like JRuby, Opal, IronPython, MagLev, and Topaz run on platforms where the concept of "memory address" doesn't even exist! And Rubinius uses a moving garbage collector, where objects can move around in memory and thus their address changes.

Why can't I overwrite self in the Integer class?

I want to be able to write number.incr, like so:
num = 1; num.incr; num
#=> 2
The error I'm seeing states:
Can't change the value of self
If that's true, how do bang! methods work?
You cannot change the value of self
An object is a class pointer and a set of instance methods (note that this link is an old version of Ruby, because its dramatically simpler, and thus better for explanatory purposes).
"Pointing" at an object means you have a variable which stores the object's location in memory. Then to do anything with the object, you first go to the location in memory (we might say "follow the pointer") to get the object, and then do the thing (e.g. invoke a method, set an ivar).
All Ruby code everywhere is executing in the context of some object. This is where your instance variables get saved, it's where Ruby looks for methods that don't have a receiver (e.g. $stdout is the receiver in $stdout.puts "hi", and the current object is the receiver in puts "hi"). Sometimes you need to do something with the current object. The way to work with objects is through variables, but what variable points at the current object? There isn't one. To fill this need, the keyword self is provided.
self acts like a variable in that it points at the location of the current object. But it is not like a variable, because you can't assign it new value. If you could, the code after that point would suddenly be operating on a different object, which is confusing and has no benefits over just using a variable.
Also remember that the object is tracked by variables which store memory addresses. What is self = 2 supposed to mean? Does it only mean that the current code operates as if it were invoked 2? Or does it mean that all variables pointing at the old object now have their values updated to point at the new one? It isn't really clear, but the former unnecessarily introduces an identity crisis, and the latter is prohibitively expensive and introduce situations where it's unclear what is correct (I'll go into that a bit more below).
You cannot mutate Fixnums
Some objects are special at the C level in Ruby (false, true, nil, fixnums, and symbols).
Variables pointing at them don't actually store a memory location. Instead, the address itself stores the type and identity of the object. Wherever it matters, Ruby checks to see if it's a special object (e.g. when looking up an instance variable), and then extracts the value from it.
So there isn't a spot in memory where the object 123 is stored. Which means self contains the idea of Fixnum 123 rather than a memory address like usual. As with variables, it will get checked for and handled specially when necessary.
Because of this, you cannot mutate the object itself (though it appears they keep a special global variable to allow you to set instance variables on things like Symbols).
Why are they doing all of this? To improve performance, I assume. A number stored in a register is just a series of bits (typically 32 or 64), which means there are hardware instructions for things like addition and multiplication. That is to say the ALU, is wired to perform these operations in a single clock cycle, rather than writing the algorithms with software, which would take many orders of magnitude longer. By storing them like this, they avoid the cost of storing and looking the object in memory, and they gain the advantage that they can directly add the two pointers using hardware. Note, however, that there are still some additional costs in Ruby, that you don't have in C (e.g. checking for overflow and converting result to Bignum).
Bang methods
You can put a bang at the end of any method. It doesn't require the object to change, it's just that people usually try to warn you when you're doing something that could have unexpected side-effects.
class C
def initialize(val)
#val = val # => 12
end # => :initialize
def bang_method!
"My val is: #{#val}" # => "My val is: 12"
end # => :bang_method!
end # => :bang_method!
c = C.new 12 # => #<C:0x007fdac48a7428 #val=12>
c.bang_method! # => "My val is: 12"
c # => #<C:0x007fdac48a7428 #val=12>
Also, there are no bang methods on integers, It wouldn't fit with the paradigm
Fixnum.instance_methods.grep(/!$/) # => [:!]
# Okay, there's one, but it's actually a boolean negation
1.! # => false
# And it's not a Fixnum method, it's an inherited boolean operator
1.method(:!).owner # => BasicObject
# In really, you call it this way, the interpreter translates it
!1 # => false
Alternatives
Make a wrapper object: I'm not going to advocate this one, but it's the closest to what you're trying to do. Basically create your own class, which is mutable, and then make it look like an integer. There's a great blog post walking through this at http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html it will get you 95% of the way there
Don't depend directly on the value of a Fixnum: I can't give better advice than this without knowing what you're trying to do / why you feel this is a need.
Also, you should show your code when you ask questions like this. I misunderstood how you were approaching it for a long time.
It's simply impossible to change self to another object. self is the receiver of the message send. There can be only one.
If that's true, how do bang! methods work?
The bang (!) is simply part of the method name. It has absolutely no special meaning whatsoever. It is a convention among Ruby programmers to name surprising variants of less surprising methods with a bang, but that's just that: a convention.

Is it possible to redefine 0 in ruby?

I'm not actually going to use this in anything in case it does actually work but is it possible to redefine 0 to act as 1 in Ruby and 1 to act as 0? Where does FixNum actually hold its value?
No, I don't think so. I'd be very suprised if you managed to. If you start overriding Fixnum's methods/operators, you maaaybe might get near that (i.e. override + so that 1+5 => 5, 0+5 => 6 etc), but you will not get full replacement of literal '0' with value 1. At least marshalling to native would expose the real 0 value of the Fixnum(0).
To be honest, I'm not really sure if you can even override the core operations like + op on a Fixnum. That could break so many things..
As far as I remember from 1.8.3 source, simple integers and doubles are held right inside a 'value' and are copied all around *). There is no singular "0", "1" or "1000" value. There is no extra dereference that would allow you to swap all the values with one shot. I doubt it changed in 1.9 and I doubt anyone got any weird idea about that in 2.0. But I don't actually know. Still, that would be strange. No platform I know interns integers and floatings.. Strings, sometimes array literals, but numbers?
So, sorry, no #define true false jokes :)
--
*) clarification from Jörg W Mittag (thanks, this is exactly what I was referring to):
(..) Fixnums do not have a place in memory, their pointer value is "magic" (in that it cannot possibly occur in a Ruby program) and treated specially by the runtime system. Read up on "tagged pointer representation", e.g. here.
Assignment does not alias Fixnum objects. There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum. Any attempt to add a singleton method to a Fixnum object will raise a TypeError. Source
That pretty much means you can't edit a Fixnum and therefor not redefine 0 or 1 in native ruby.
Though as these Fixnums are also Objects they have unique object id's that cleary reference them somewhere in the memory. See BasicObject#__id__
If you can locate the memory space where 0 and 1 objects are and switch these, you should have effectivle switched 0 and 1 behavior in ruby as now either will reference the other object.
So to answer your question: No redefining Fixnums is not possible in Ruby, switching their behaviour should be possible though.

Accessing objects memory address in ruby..?

Is there any way in Ruby to get the memory address of objects?
(i = 5)
Is it possible to get the memory address of that object 5?
I have been trying to get this over some time.
Yes.
From "Fiddling with Ruby’s Fiddle":
"You can get the actual pointer value of an object by taking the object id, and doing a bitwise shift to the left. This will give you the pointer (or memory location) of the ruby object in memory."
Using your example of i = 5 it could be done like so:
i = 5
i_ptr_int = i.object_id << 1
=> 22
"In Ruby, why does inspect() print out some kind of object id which is different from what object_id() gives?" has more info about object_id, including a brief introduction to the C source underlying the implementation which you might find helpful.
Take a look at "Fiddle" for some other cool things you can do.
Ruby Memory Validator should be able to pull that off but it's not free.
Aman Gupta patched Joe Damatos memprof but it seems to be a work in progress and I never got it to run on my machine. Joe has a couple of really good posts about memprof and other low level stuff on his blog.
Now I'm not so sure they really can. Integers are stored as a Fixnum and Fixnum is not a usual Ruby object, it just looks that way. Ruby uses a clever speed up trick with the object_id to make Fixnum objects immidiate values. The number is in fact stored in the object_id itself. That's why two different Fixnum containing the same value has the same object_id.
>> x=5
=> 5
>> y=5
=> 5
>> x.object_id
=> 11
>> y.object_id
=> 11
>> z=4711
=> 4711
>> z.object_id
=> 9423
The object_id of a Fixnum is actually created by bit shifting to the left and then setting the least significant bit.
5 is 0b101 and the object_id for 5 is 11 and 11 in binary is 0b1011.
4711 is 0b0001001001100111, shift left and set the bit and you get 0b0010010011001111 and that is 9423 which happens to be the object_id for z above.
This behaviour is most probably implementation specific but I don't know of a Ruby implementation that doesn't handle Fixnum this way.
There are at least three more immediate objects in Ruby and that's false, true and nil.
>> false.object_id
=> 0
>> true.object_id
=> 2
>> nil.object_id
=> 4
I don't know of a way of having the exact address, but maybe you're looking for something like the object_id method?
Extract from its documentation
Returns an integer identifier for obj.
The same number will be returned on all calls to id for a given object, and no two active objects will share an id
Example:
> 5.object_id
=> 11
> true.object_id
=> 2
Ruby Memory Validator does give you the memory address for the object.
Joe Damato's work (http://timetobleed.com/plugging-ruby-memory-leaks-heapstack-dump-patches-to-help-take-out-the-trash) and (http://timetobleed.com/memprof-a-ruby-level-memory-profiler) is based on the work Software Verification did to create a Ruby memory inspection API (http://www.softwareverify.com/ruby/customBuild/index.html).
Joe describes that on his blog. Therefore Joe's work should also return the appropriate addresses. I'm not fully up to speed with the latest version of Joe's work - he only told me about the first version, not the latest version, but nonetheless, if you are tracking memory allocations in the underpinnings of Ruby, you are tracking the addresses of the objects that hold whatever it is you are allocating.
That doesn't mean you can dereference the address and read the data value you expect to find at that address. Dereferencing the address will point you to the internals of a basic Ruby Object. Ruby objects are a basic object which then store additional data alongside, so knowing the actual address is not very useful unless you are writing a tool like Ruby Memory Validator or memprof.
How do I know the above about Ruby Memory Validator and the API we released? I designed Ruby Memory Validator. I also wrote the assembly language bits that intercept the Ruby calls that allocate the memory.
What exactly are you trying to do?
Keep in mind that a Ruby object is not directly analogous to a variable in a language like C or C++. For example:
a = "foo"
b = a
b[2] = 'b'
b
=> "fob"
a
=> "fob"
a == b
=> true
a.object_id
=> 23924940
b.object_id
=> 23924940
a.object_id == b.object_id
=> true
Even through a and b are separate variables, they are references to the same underlying data and have the same object_id.
If you find yourself needing to take the address of a variable, there is probably an easier approach to whatever you are trying to do.
Since you indicated (buried in a comment somewhere) that you're really just trying to understand how Ruby references things, I think things work as follows:
A VALUE in Ruby's C api represents an object (a nil, a FixNum or a Boolean) or a pointer to an Object. The VALUE contains a 3 bit tag indicating which of these it is, and contains the value (for the first 3) or a direct memory pointer (for an Object). There's no way to get at the VALUE directly in Ruby, (I'm not sure if the object_id is the same or different.)
Note that JRuby operates differently.

Does ruby's object_id method refer to the memory location?

Or does this method just indicate a unique integer that each object has?
It is a combination of many parameters, value, object type, place in memory.
More can be read here
It isn't a direct reference to the memory location and the "encoding" is specific to a particular Ruby implementation. If you can read C code, you may find it instructive to look at the rb_obj_id and id2ref methods in gc.c in the Ruby 1.8.6 source. You can also read more about the "encoding" in the "Objects embedded in VALUE" section of the partial translation of the Ruby Hacking Guide chapter 2.
It's worth noting that you can perform a reverse-lookup of object IDs using:
ObjectSpace._id2ref(object_id)
For example:
ObjectSpace._id2ref(0) #=> false
ObjectSpace._id2ref(1) #=> 0
ObjectSpace._id2ref(2) #=> true
ObjectSpace._id2ref(3) #=> 1
ObjectSpace._id2ref(4) #=> nil
well, it depends on what you mean by "ruby" ;) In jruby it's just a unique integer as far as I can tell.
Also, things like numbers aren't the memory location. I forget all the details and am sure someone will give them to you.
irb(main):020:0> 1.object_id
=> 3
irb(main):021:0> (2-1).object_id
=> 3
In "normal" ruby (MRI 1.8.x and 1.9.x) it's just a unique value.
This is also the case in IronRuby

Resources