Accessing objects memory address in ruby..? - ruby

Is there any way in Ruby to get the memory address of objects?
(i = 5)
Is it possible to get the memory address of that object 5?
I have been trying to get this over some time.

Yes.
From "Fiddling with Ruby’s Fiddle":
"You can get the actual pointer value of an object by taking the object id, and doing a bitwise shift to the left. This will give you the pointer (or memory location) of the ruby object in memory."
Using your example of i = 5 it could be done like so:
i = 5
i_ptr_int = i.object_id << 1
=> 22
"In Ruby, why does inspect() print out some kind of object id which is different from what object_id() gives?" has more info about object_id, including a brief introduction to the C source underlying the implementation which you might find helpful.
Take a look at "Fiddle" for some other cool things you can do.

Ruby Memory Validator should be able to pull that off but it's not free.
Aman Gupta patched Joe Damatos memprof but it seems to be a work in progress and I never got it to run on my machine. Joe has a couple of really good posts about memprof and other low level stuff on his blog.
Now I'm not so sure they really can. Integers are stored as a Fixnum and Fixnum is not a usual Ruby object, it just looks that way. Ruby uses a clever speed up trick with the object_id to make Fixnum objects immidiate values. The number is in fact stored in the object_id itself. That's why two different Fixnum containing the same value has the same object_id.
>> x=5
=> 5
>> y=5
=> 5
>> x.object_id
=> 11
>> y.object_id
=> 11
>> z=4711
=> 4711
>> z.object_id
=> 9423
The object_id of a Fixnum is actually created by bit shifting to the left and then setting the least significant bit.
5 is 0b101 and the object_id for 5 is 11 and 11 in binary is 0b1011.
4711 is 0b0001001001100111, shift left and set the bit and you get 0b0010010011001111 and that is 9423 which happens to be the object_id for z above.
This behaviour is most probably implementation specific but I don't know of a Ruby implementation that doesn't handle Fixnum this way.
There are at least three more immediate objects in Ruby and that's false, true and nil.
>> false.object_id
=> 0
>> true.object_id
=> 2
>> nil.object_id
=> 4

I don't know of a way of having the exact address, but maybe you're looking for something like the object_id method?
Extract from its documentation
Returns an integer identifier for obj.
The same number will be returned on all calls to id for a given object, and no two active objects will share an id
Example:
> 5.object_id
=> 11
> true.object_id
=> 2

Ruby Memory Validator does give you the memory address for the object.
Joe Damato's work (http://timetobleed.com/plugging-ruby-memory-leaks-heapstack-dump-patches-to-help-take-out-the-trash) and (http://timetobleed.com/memprof-a-ruby-level-memory-profiler) is based on the work Software Verification did to create a Ruby memory inspection API (http://www.softwareverify.com/ruby/customBuild/index.html).
Joe describes that on his blog. Therefore Joe's work should also return the appropriate addresses. I'm not fully up to speed with the latest version of Joe's work - he only told me about the first version, not the latest version, but nonetheless, if you are tracking memory allocations in the underpinnings of Ruby, you are tracking the addresses of the objects that hold whatever it is you are allocating.
That doesn't mean you can dereference the address and read the data value you expect to find at that address. Dereferencing the address will point you to the internals of a basic Ruby Object. Ruby objects are a basic object which then store additional data alongside, so knowing the actual address is not very useful unless you are writing a tool like Ruby Memory Validator or memprof.
How do I know the above about Ruby Memory Validator and the API we released? I designed Ruby Memory Validator. I also wrote the assembly language bits that intercept the Ruby calls that allocate the memory.

What exactly are you trying to do?
Keep in mind that a Ruby object is not directly analogous to a variable in a language like C or C++. For example:
a = "foo"
b = a
b[2] = 'b'
b
=> "fob"
a
=> "fob"
a == b
=> true
a.object_id
=> 23924940
b.object_id
=> 23924940
a.object_id == b.object_id
=> true
Even through a and b are separate variables, they are references to the same underlying data and have the same object_id.
If you find yourself needing to take the address of a variable, there is probably an easier approach to whatever you are trying to do.

Since you indicated (buried in a comment somewhere) that you're really just trying to understand how Ruby references things, I think things work as follows:
A VALUE in Ruby's C api represents an object (a nil, a FixNum or a Boolean) or a pointer to an Object. The VALUE contains a 3 bit tag indicating which of these it is, and contains the value (for the first 3) or a direct memory pointer (for an Object). There's no way to get at the VALUE directly in Ruby, (I'm not sure if the object_id is the same or different.)
Note that JRuby operates differently.

Related

Why can't I overwrite self in the Integer class?

I want to be able to write number.incr, like so:
num = 1; num.incr; num
#=> 2
The error I'm seeing states:
Can't change the value of self
If that's true, how do bang! methods work?
You cannot change the value of self
An object is a class pointer and a set of instance methods (note that this link is an old version of Ruby, because its dramatically simpler, and thus better for explanatory purposes).
"Pointing" at an object means you have a variable which stores the object's location in memory. Then to do anything with the object, you first go to the location in memory (we might say "follow the pointer") to get the object, and then do the thing (e.g. invoke a method, set an ivar).
All Ruby code everywhere is executing in the context of some object. This is where your instance variables get saved, it's where Ruby looks for methods that don't have a receiver (e.g. $stdout is the receiver in $stdout.puts "hi", and the current object is the receiver in puts "hi"). Sometimes you need to do something with the current object. The way to work with objects is through variables, but what variable points at the current object? There isn't one. To fill this need, the keyword self is provided.
self acts like a variable in that it points at the location of the current object. But it is not like a variable, because you can't assign it new value. If you could, the code after that point would suddenly be operating on a different object, which is confusing and has no benefits over just using a variable.
Also remember that the object is tracked by variables which store memory addresses. What is self = 2 supposed to mean? Does it only mean that the current code operates as if it were invoked 2? Or does it mean that all variables pointing at the old object now have their values updated to point at the new one? It isn't really clear, but the former unnecessarily introduces an identity crisis, and the latter is prohibitively expensive and introduce situations where it's unclear what is correct (I'll go into that a bit more below).
You cannot mutate Fixnums
Some objects are special at the C level in Ruby (false, true, nil, fixnums, and symbols).
Variables pointing at them don't actually store a memory location. Instead, the address itself stores the type and identity of the object. Wherever it matters, Ruby checks to see if it's a special object (e.g. when looking up an instance variable), and then extracts the value from it.
So there isn't a spot in memory where the object 123 is stored. Which means self contains the idea of Fixnum 123 rather than a memory address like usual. As with variables, it will get checked for and handled specially when necessary.
Because of this, you cannot mutate the object itself (though it appears they keep a special global variable to allow you to set instance variables on things like Symbols).
Why are they doing all of this? To improve performance, I assume. A number stored in a register is just a series of bits (typically 32 or 64), which means there are hardware instructions for things like addition and multiplication. That is to say the ALU, is wired to perform these operations in a single clock cycle, rather than writing the algorithms with software, which would take many orders of magnitude longer. By storing them like this, they avoid the cost of storing and looking the object in memory, and they gain the advantage that they can directly add the two pointers using hardware. Note, however, that there are still some additional costs in Ruby, that you don't have in C (e.g. checking for overflow and converting result to Bignum).
Bang methods
You can put a bang at the end of any method. It doesn't require the object to change, it's just that people usually try to warn you when you're doing something that could have unexpected side-effects.
class C
def initialize(val)
#val = val # => 12
end # => :initialize
def bang_method!
"My val is: #{#val}" # => "My val is: 12"
end # => :bang_method!
end # => :bang_method!
c = C.new 12 # => #<C:0x007fdac48a7428 #val=12>
c.bang_method! # => "My val is: 12"
c # => #<C:0x007fdac48a7428 #val=12>
Also, there are no bang methods on integers, It wouldn't fit with the paradigm
Fixnum.instance_methods.grep(/!$/) # => [:!]
# Okay, there's one, but it's actually a boolean negation
1.! # => false
# And it's not a Fixnum method, it's an inherited boolean operator
1.method(:!).owner # => BasicObject
# In really, you call it this way, the interpreter translates it
!1 # => false
Alternatives
Make a wrapper object: I'm not going to advocate this one, but it's the closest to what you're trying to do. Basically create your own class, which is mutable, and then make it look like an integer. There's a great blog post walking through this at http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html it will get you 95% of the way there
Don't depend directly on the value of a Fixnum: I can't give better advice than this without knowing what you're trying to do / why you feel this is a need.
Also, you should show your code when you ask questions like this. I misunderstood how you were approaching it for a long time.
It's simply impossible to change self to another object. self is the receiver of the message send. There can be only one.
If that's true, how do bang! methods work?
The bang (!) is simply part of the method name. It has absolutely no special meaning whatsoever. It is a convention among Ruby programmers to name surprising variants of less surprising methods with a bang, but that's just that: a convention.

Is it possible to redefine 0 in ruby?

I'm not actually going to use this in anything in case it does actually work but is it possible to redefine 0 to act as 1 in Ruby and 1 to act as 0? Where does FixNum actually hold its value?
No, I don't think so. I'd be very suprised if you managed to. If you start overriding Fixnum's methods/operators, you maaaybe might get near that (i.e. override + so that 1+5 => 5, 0+5 => 6 etc), but you will not get full replacement of literal '0' with value 1. At least marshalling to native would expose the real 0 value of the Fixnum(0).
To be honest, I'm not really sure if you can even override the core operations like + op on a Fixnum. That could break so many things..
As far as I remember from 1.8.3 source, simple integers and doubles are held right inside a 'value' and are copied all around *). There is no singular "0", "1" or "1000" value. There is no extra dereference that would allow you to swap all the values with one shot. I doubt it changed in 1.9 and I doubt anyone got any weird idea about that in 2.0. But I don't actually know. Still, that would be strange. No platform I know interns integers and floatings.. Strings, sometimes array literals, but numbers?
So, sorry, no #define true false jokes :)
--
*) clarification from Jörg W Mittag (thanks, this is exactly what I was referring to):
(..) Fixnums do not have a place in memory, their pointer value is "magic" (in that it cannot possibly occur in a Ruby program) and treated specially by the runtime system. Read up on "tagged pointer representation", e.g. here.
Assignment does not alias Fixnum objects. There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum. Any attempt to add a singleton method to a Fixnum object will raise a TypeError. Source
That pretty much means you can't edit a Fixnum and therefor not redefine 0 or 1 in native ruby.
Though as these Fixnums are also Objects they have unique object id's that cleary reference them somewhere in the memory. See BasicObject#__id__
If you can locate the memory space where 0 and 1 objects are and switch these, you should have effectivle switched 0 and 1 behavior in ruby as now either will reference the other object.
So to answer your question: No redefining Fixnums is not possible in Ruby, switching their behaviour should be possible though.

Why is it not a good idea to dynamically create a lot of symbols in ruby (for versions before 2.2)?

What is the function of symbol in ruby? what's difference between string and symbol?
Why is it not a good idea to dynamically create a lot of symbols?
Symbols are like strings but they are immutable - they can't be modified.
They are only put into memory once, making them very efficient to use for things like keys in hashes but they stay in memory until the program exits. This makes them a memory hog if you misuse them.
If you dynamically create lots of symbols, you are allocating a lot of memory that can't be freed until your program ends (edit: this is no longer the case since Ruby 2.2). You should only dynamically create symbols (using string.to_sym) if you know you will:
need to repeatedly access the symbol
not need to modify them
As I said earlier, they are useful for things like hashes - where you care more about the identity of the variable than its value. Symbols, when correctly used, are a readable and efficient way to pass around identity.
I will explain what I mean about the immutability of symbols RE your comment.
Strings are like arrays; they can be modified in place:
12:17:44 ~$ irb
irb(main):001:0> string = "Hello World!"
=> "Hello World!"
irb(main):002:0> string[5] = 'z'
=> "z"
irb(main):003:0> string
=> "HellozWorld!"
irb(main):004:0>
Symbols are more like numbers; they can't be edited in place:
irb(main):011:0> symbol = :Hello_World
=> :Hello_World
irb(main):012:0> symbol[5] = 'z'
NoMethodError: undefined method `[]=' for :Hello_World:Symbol
from (irb):12
from :0
A symbol is the same object and the same allocation of memory no matter where it is used:
>> :hello.object_id
=> 331068
>> a = :hello
=> :hello
>> a.object_id
=> 331068
>> b = :hello
=> :hello
>> b.object_id
=> 331068
>> a = "hello"
=> "hello"
>> a.object_id
=> 2149256980
>> b = "hello"
=> "hello"
>> b.object_id
=> 2149235120
>> b = "hell" + "o"
Two strings which are 'the same' in that they contain the same characters may not reference the same memory, which can be inefficient if you're using strings for, say, hashes.
So, symbols can be useful for reducing memory overhead. However - they are a memory leak waiting to happen, because symbols cannot be garbage collected once created. Creating thousands and thousands of symbols would allocate the memory and not be recoverable. Yikes!
It can be particularly bad to create symbols from user input without validating the input against some kind of a white-list (for example, for query string parameters in RoR). If user input is converted to symbols without validation, a malicious user can cause your program to consume large amounts of memory that will never be garbage collected.
Bad (a symbol is created regardless of user input):
name = params[:name].to_sym
Good (a symbol is only created if the user input is allowed):
whitelist = ['allowed_value', 'another_allowed_value']
raise ArgumentError unless whitelist.include?(params[:name])
name = params[:name].to_sym
Starting Ruby 2.2 and above Symbols are automatically garbage collected and so this should not be an issue.
If you are using Ruby 2.2.0 or later, it should usually be OK to dynamically create a lot of symbols, because they will be garbage collected according to the Ruby 2.2.0-preview1 announcement, which has a link to more details about the new symbol GC. However, if you pass your dynamic symbols to some kind of code that converts it to an ID (an internal Ruby implementation concept used in the C source code), then in that case it will get pinned and never get garbage collected. I'm not sure how commonly that happens.
You can think of symbols as a name of something, and strings (roughly) as a sequence of characters. In many cases you could use either a symbol or a string, or you could use a mixture of the two. Symbols are immutable, which means they can't be changed after being created. The way symbols are implemented, it is very efficient to compare two symbols to see if they are equal, so using them as keys to hashes should be a little faster than using strings. Symbols don't have a lot the methods that strings do, such as start_with? so you would have to use to_s to convert the symbol into a string before calling those methods.
You can read more about symbols here in the documentation:
http://www.ruby-doc.org/core-2.1.3/Symbol.html

Constant Assignment Bug in Ruby?

We caught some code in Ruby that seems odd, and I was wondering if someone could explain it:
$ irb
irb(main):001:0> APPLE = 'aaa'
=> "aaa"
irb(main):002:0> banana = APPLE
=> "aaa"
irb(main):003:0> banana << 'bbb'
=> "aaabbb"
irb(main):004:0> banana
=> "aaabbb"
irb(main):005:0> APPLE
=> "aaabbb"
Catch that? The constant was appended to at the same time the local variable was.
Known behavior? Expected?
Known behaviour. Constants don't mean that you can't modify the object it refers to, merely that it'll give you a warning (and only a warning) if you assign it to a different object.
In short, ruby constants aren't.
Note: This behaviour is listed in an answer to "What are the Ruby Gotchas a newbie should be warned about?" It's a worthwhile read.
Catch that? The constant was appended to at the same time the local variable was.
No, it wasn't appended to, and neither was the local variable.
The single object that both the constant and the local variable are referring to was appended to, but neither the constant nor the local variable was changed. You cannot modify or change a variable or constant in Ruby (at least not in the way that your question implies), the only thing you can change is objects.
The only two things you can do with variables or constants is dereferencing them and assigning to them.
The constant is a red herring here, it is completely irrelevant to the example given. The only thing that is relevant is that there is only one single object in the entire example. That single object is accessible under two different names. If the object changes, then the object changes. Period. It does not mysteriously split itself in two. Which name you use to look at the changed object doesn't matter. There is only one object anyway.
This works exactly the same as in any other programming language: if you have multiple references to a mutable object in, say, Python, Java, C#, C++, C, Lisp, Smalltalk, JavaScript, PHP, Perl or whatever, then any change to that object will be visible no matter what reference is used, even if some of those references are final or const or whatever that particular language calls it.
This is simply how shared mutable state works and is just one of the many reasons why shared mutable state is bad.
In Ruby, you can generally call the freeze method on any object to make it immutable. However, again, you are modifying the object here, so anybody else who has a reference to that object will all the sudden find that the object has become immutable. Therefore, just to be safe, you need to copy the object first, by calling dup. But of course, that's not enough either, if you think of an array, for example: if you dup the array, you get a different array, but the objects inside the array are the still the same ones in the original array. And if you freeze the array, then you can no longer modify the array, but the objects in the array may very well still be mutable:
ORIG = ['Hello']
CLONE = ORIG.dup.freeze
CLONE[0] << ', World!'
CLONE # => ['Hello, World!']
That's shared mutable state for you. The only way to escape this madness is either to give up shared state (e.g. Actor Programming: if nobody else can see it, then it doesn't matter how often or when it changes) or mutable state (i.e. Functional Programming: if it never changes, then it doesn't matter how many others see it).
The fact that one of the two variables in the original example is actually a constant is completely irrelevant to the problem. There two main differences between a variable and a constant in Ruby: they have different lookup rules, and constants generate a warning if they are assigned to more than once. But in this example, the lookup rules are irrelevant and the constant is assigned to only once, so there really is no difference between a variable and a constant in this case.
You can freeze constants if you want them to be unchangable:
>> APPLE = 'aaa'
=> "aaa"
>> banana = APPLE
=> "aaa"
>> APPLE.freeze
=> "aaa"
>> banana.frozen?
=> true
>> banana << 'bbb'
TypeError: can't modify frozen string
from (irb):5:in `<<'
from (irb):5
Constants in Ruby aren't "constants". You might as well use any other name; putting them in all caps doesn't change anything, interpreter-wise, about the object, unless you try to change the pointer's address.
If you look at it that way, the behavior is obvious and necessary; Apple is a pointer to a string object, and so is banana. You then edit the object that banana is pointing to. Apple is pointing to that same object, so the change is reflected there too.

Does ruby's object_id method refer to the memory location?

Or does this method just indicate a unique integer that each object has?
It is a combination of many parameters, value, object type, place in memory.
More can be read here
It isn't a direct reference to the memory location and the "encoding" is specific to a particular Ruby implementation. If you can read C code, you may find it instructive to look at the rb_obj_id and id2ref methods in gc.c in the Ruby 1.8.6 source. You can also read more about the "encoding" in the "Objects embedded in VALUE" section of the partial translation of the Ruby Hacking Guide chapter 2.
It's worth noting that you can perform a reverse-lookup of object IDs using:
ObjectSpace._id2ref(object_id)
For example:
ObjectSpace._id2ref(0) #=> false
ObjectSpace._id2ref(1) #=> 0
ObjectSpace._id2ref(2) #=> true
ObjectSpace._id2ref(3) #=> 1
ObjectSpace._id2ref(4) #=> nil
well, it depends on what you mean by "ruby" ;) In jruby it's just a unique integer as far as I can tell.
Also, things like numbers aren't the memory location. I forget all the details and am sure someone will give them to you.
irb(main):020:0> 1.object_id
=> 3
irb(main):021:0> (2-1).object_id
=> 3
In "normal" ruby (MRI 1.8.x and 1.9.x) it's just a unique value.
This is also the case in IronRuby

Resources