Ruby object to_s what is the encoding of the object id? - ruby

In Ruby, the to_s on an object includes an encoding of the object's id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
In the documentation it says
Returns a string representing obj. The default to_s prints the object’s class and an encoding of the object id.
https://apidock.com/ruby/Object/to_s
In the example above, the encoding of the object id is 0x00007fac5eb6afc8.
In How does object_id assignment work? they explain
In MRI the object_id of an object is the same as the VALUE that represents the object on the C level.
So I compared to the object_id and it is not the same as the encoding of the object id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
[3] pry(main)> shape.object_id
=> 70189150066660
What exactly is the encoding of the object id? It does not appear to be the object_id.

Think of the object_id, or __id__ as the "pointer" for the object. It is not technically a pointer, but does contain a unique value that can be used to retrieve the internal C VALUE.
There are patterns to the value it has for some data types, as you can see with its hexadecimal representation with to_s. I am will not go into all the details, as there are already numerous answers on SO explaining, and already linked from comments, but integers (up to a FIXNUM_MAX, have predictable values, and special constants like true, false, and nil will always have the same object_id in every run.
To put simply, it is nothing more than a number, and shown as a hexadecimal (base 16) value, not any actual "encoding" or cypher.
Going to expand upon this a bit more in light of your latest edits to the question. As you posted, the hexadecimal number you see in to_s is the value of the internal C VALUE of the object. VALUE is a C data type (unsigned, pointer size number) that every Ruby object is represented as in C code. As #Stefan pointed out in a comment, for non-integer types (I speak only for MRI version), it is twice the value of the object_id. Not that you probably care, but you can shift the bits of an integer to predict the value for those.
Therefore, using you example.
A value of 0x00007fac5eb6afc8 is simple hexadecimal notation for a number. It uses a base 16 counting system as opposed to the base 10 decimal system we are more used to in everyday life. It is simply a different way of looking at the same number.
So, using that logic.
a = 0x00007fac5eb6afc8
#=> 140378300133320 # Decimal representation
a /= 2 # Remember, non-integers are half of this value
#=> 70189150066660 # Your object_id

The best answer you can get is: You don't know, and you shouldn't need to.
Ruby guarantees exactly three things about object IDs:
An object has the same ID during its lifetime.
No two objects have the same ID at the same time.
IDs are integers.
In particular, this means that you cannot rely on a specific object having a specific ID (for example, nil having ID 8). It also means that IDs can be re-used. You should think of it as nothing but opaque identifier.
And, as you quoted, the default Object#to_s uses "some" encoding of the ID.
And that is all you know, and all you should ever rely on. In particular, you should never try to parse IDs or Object#to_s.
So, the ID part of Object#to_s is "some unspecified encoding" of the ID, which itself is "some opaque identifier".
Everything else is deliberately left unspecified, so that different implementations can make different choices that make sense for their specific needs. For example, it would be stupid to tie object IDs to memory addresses, because implementations like JRuby, Opal, IronPython, MagLev, and Topaz run on platforms where the concept of "memory address" doesn't even exist! And Rubinius uses a moving garbage collector, where objects can move around in memory and thus their address changes.

Related

What is the difference between being immutable and the fact that there can only be one instance of a Symbol?

I'm reading Eloquent Ruby, and am on Chapter 6 on Symbols. Some excerpts:
"There can only ever be one instance of any given symbol. If I mention :all twice in my code, it is always the same :all."
a = :all
b = :all
puts a.object_id, b.object_id # same objects
"Another aspect of symbols that makes them so well suited to their chosen career is that symbols are immutable - once you create that :all symbol, it will be :all until the end of time (or at least until your Ruby interpreter exits)"
What is the difference between being immutable and the fact that there can only be one instance of you?
By the way, I would like to write the previous sentence more accurately: "What is the difference between a class being immutable and the fact that there can only be one instance of the class?" Is class the right word to insert there?
How would you even go about trying to mutate a symbol, they don't seem to hold values like other variables?
Immutable means that an object cannot be changed. In Ruby, symbols are immutable. To make a symbol mutable, you have to perform type conversion to a string, which is mutable.
a = :mystring
a = a.to_s
=> "mystring"
For proof that a symbol is immutable, you can call the frozen? property on it.
a.frozen?
=> true
Note that symbols cannot be unfrozen unlike strings which have an unfreeze method.
For object ids
In Ruby, the object_id of an object is the same as the VALUE that represents the object on the C level. For most objects, this points to a location in memory where the object data is stored. This varies over time because it depends on where the system decided to allocate its memory.
Symbols have the same object id because they are meant to represent a SINGLE value.
To check this out, let's type to the console the same symbol multiple times.
:z.object_id
=> 636328
:z.object_id
=> 636328
:z.object_id
=> 636328
Now, let's try the same thing only with strings
"z".object_id
=> 21237740
"z".object_id
=> 24355380
As you can see, here we have two references to the string z, both of which are different objects. Thus, they have different object_ids.
This also means that symbols can save quite a bit of memory, especially if we are dealing with big data. Because symbols are the same object, it's faster to compare them then it is strings. Strings require comparing the values instead of the object ids.
Your sentence is fine; you're not sure of the common phrase used to describe a class with only one instance. I'll explain that as I go along.
An object that is immutable cannot change through any operations done on it. This means that any operation that would change a symbol would generate a new one instead.
:foo.object_id # 1520028
:foo.upcase.object_id # 70209716662240
:foo.capitalize.object_id # 70209719120060
You can certainly write objects that are immutable, or make them immutable (with some caveats) via freeze, but you can always create a new instance of them.
f = "foo"
f.freeze
f1 = "foo"
puts f.object_id == f1.object_id # false
An object that only ever has one instance of itself is considered to be a singleton.
If there's only one instance of it, then you only store it in memory once.
If you attempt to create it, you only get the previously existing object back.

Why can't I overwrite self in the Integer class?

I want to be able to write number.incr, like so:
num = 1; num.incr; num
#=> 2
The error I'm seeing states:
Can't change the value of self
If that's true, how do bang! methods work?
You cannot change the value of self
An object is a class pointer and a set of instance methods (note that this link is an old version of Ruby, because its dramatically simpler, and thus better for explanatory purposes).
"Pointing" at an object means you have a variable which stores the object's location in memory. Then to do anything with the object, you first go to the location in memory (we might say "follow the pointer") to get the object, and then do the thing (e.g. invoke a method, set an ivar).
All Ruby code everywhere is executing in the context of some object. This is where your instance variables get saved, it's where Ruby looks for methods that don't have a receiver (e.g. $stdout is the receiver in $stdout.puts "hi", and the current object is the receiver in puts "hi"). Sometimes you need to do something with the current object. The way to work with objects is through variables, but what variable points at the current object? There isn't one. To fill this need, the keyword self is provided.
self acts like a variable in that it points at the location of the current object. But it is not like a variable, because you can't assign it new value. If you could, the code after that point would suddenly be operating on a different object, which is confusing and has no benefits over just using a variable.
Also remember that the object is tracked by variables which store memory addresses. What is self = 2 supposed to mean? Does it only mean that the current code operates as if it were invoked 2? Or does it mean that all variables pointing at the old object now have their values updated to point at the new one? It isn't really clear, but the former unnecessarily introduces an identity crisis, and the latter is prohibitively expensive and introduce situations where it's unclear what is correct (I'll go into that a bit more below).
You cannot mutate Fixnums
Some objects are special at the C level in Ruby (false, true, nil, fixnums, and symbols).
Variables pointing at them don't actually store a memory location. Instead, the address itself stores the type and identity of the object. Wherever it matters, Ruby checks to see if it's a special object (e.g. when looking up an instance variable), and then extracts the value from it.
So there isn't a spot in memory where the object 123 is stored. Which means self contains the idea of Fixnum 123 rather than a memory address like usual. As with variables, it will get checked for and handled specially when necessary.
Because of this, you cannot mutate the object itself (though it appears they keep a special global variable to allow you to set instance variables on things like Symbols).
Why are they doing all of this? To improve performance, I assume. A number stored in a register is just a series of bits (typically 32 or 64), which means there are hardware instructions for things like addition and multiplication. That is to say the ALU, is wired to perform these operations in a single clock cycle, rather than writing the algorithms with software, which would take many orders of magnitude longer. By storing them like this, they avoid the cost of storing and looking the object in memory, and they gain the advantage that they can directly add the two pointers using hardware. Note, however, that there are still some additional costs in Ruby, that you don't have in C (e.g. checking for overflow and converting result to Bignum).
Bang methods
You can put a bang at the end of any method. It doesn't require the object to change, it's just that people usually try to warn you when you're doing something that could have unexpected side-effects.
class C
def initialize(val)
#val = val # => 12
end # => :initialize
def bang_method!
"My val is: #{#val}" # => "My val is: 12"
end # => :bang_method!
end # => :bang_method!
c = C.new 12 # => #<C:0x007fdac48a7428 #val=12>
c.bang_method! # => "My val is: 12"
c # => #<C:0x007fdac48a7428 #val=12>
Also, there are no bang methods on integers, It wouldn't fit with the paradigm
Fixnum.instance_methods.grep(/!$/) # => [:!]
# Okay, there's one, but it's actually a boolean negation
1.! # => false
# And it's not a Fixnum method, it's an inherited boolean operator
1.method(:!).owner # => BasicObject
# In really, you call it this way, the interpreter translates it
!1 # => false
Alternatives
Make a wrapper object: I'm not going to advocate this one, but it's the closest to what you're trying to do. Basically create your own class, which is mutable, and then make it look like an integer. There's a great blog post walking through this at http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html it will get you 95% of the way there
Don't depend directly on the value of a Fixnum: I can't give better advice than this without knowing what you're trying to do / why you feel this is a need.
Also, you should show your code when you ask questions like this. I misunderstood how you were approaching it for a long time.
It's simply impossible to change self to another object. self is the receiver of the message send. There can be only one.
If that's true, how do bang! methods work?
The bang (!) is simply part of the method name. It has absolutely no special meaning whatsoever. It is a convention among Ruby programmers to name surprising variants of less surprising methods with a bang, but that's just that: a convention.

Does the actual value of a enum class enumeration remain constant/invariant?

Given code for an incomplete server like:
enum class Command : uint32_t {
LOGIN,
MESSAGE,
JOIN_CHANNEL,
PART_CHANNEL,
INVALID
};
Can I expect that converting Command::LOGIN to an integer will always give the same value?
Across compilers?
Across compiler versions?
If I add another enumeration?
If I remove an enumeration?
Converting Command::LOGIN would look something like this:
uint32_t number = static_cast<uint32_t>(Command::LOGIN);
Some extra information on what I am doing here. This enumeration is fed onto the wire by converting it to an integer sending it along to the server/client. I do not really particularly care what the number is, as long as it will always stay the same. If it will not stay the same, then obviously I will have to provide my own numbers through the usual way.
Now my sneaking suspicion is that it will change depending on what compiler was used to compile the code, but I would like to know for sure.
Bonus question: How does the compiler/language determine what number to use for Command::LOGIN?
Before submitting this question, I have noticed some changes from say 3137527848 to 0 and back, so it is obviously not valid to rely on it not changing. I am still curious about how this number is determined, and how or why that number is changing.
From the C++11 Standard (or rather, n3485):
[dcl.enum]/2
If the first enumerator has no initializer, the value of the corresponding constant is zero. An enumerator-definition without an initializer gives the enumerator the value obtained by increasing the value of the previous enumerator by one.
Additionally, [expr.static.cast]/9
A value of a scoped enumeration type can be explicitly converted to an integral type. The value is unchanged if the original value can be represented by the specified type.
I think it's obvious that the values of the enumerators can be represented by uint32_t; if they weren't, [dcl.enum]/5 says "if the initializing value of an enumerator cannot be represented by the underlying type, the program is ill-formed."
So as long as you use the underlying type for conversion (either explicitly or via std::underlying_type<Command>::type), the value of those enumerators are fixed as long as you don't add any enumerators before them (in the same enumeration) or alter their order.
As Nicolas Louis Guillemo pointed out, be aware of possible different endianness when transferring the value.
If you assign explicit integer values to your enum constants then you are guaranteed to always have the same value when converting to the integer type.
Just do something like the following:
enum class Command : uint32_t {
LOGIN = 12,
MESSAGE = 46,
JOIN_CHANNEL = 5,
PART_CHANNEL = 0,
INVALID = 42
};
If you don't specify any values explicitly, the values are set implicitly, starting from zero and increasing by one with each move down the list.
Quoting from draft n3485:
[dcl.enum] paragraph 2
The enumeration type declared with an enum-key of only enum is an
unscoped enumeration, and its enumerators are unscoped enumerators.
The enum-keys enum class and enum struct are semantically equivalent;
an enumeration type declared with one of these is a scoped
enumeration, and its enumerators are scoped enumerators. [...] The
identifiers in an enumerator-list are declared as constants, and can
appear wherever constants are required. An enumerator-definition with
= gives the associated enumerator the value indicated by the constant-expression. If the first enumerator has no initializer, the
value of the corresponding constant is zero. An
enumerator-definition without an initializer gives the enumerator the
value obtained by increasing the value of the previous enumerator by
one.
The drawback of relying on this, is that if the list order somehow changes in the future, then your code might silently break, so I would advise you be explicit.
Command::LOGIN will always be 0 as long as it's the first enum in the list. Just be careful with the rest of the enums, because they will have different binary representations based on if the computer is using big endian or little endian.

Ruby: Is variable is object in ruby?

I heard that everything in ruby is object. I replied in an interview that a variable is an object, and the interviewer said NO. Anybody know the truth?
"In ruby, everything is an object" is basically true.
But more accurately, I would say that any value that can be assigned to a variable or returned from a method is an object. Is a variable an object? Not really. A variable is simply a name of an object (also known as a "pointer") that allows you locate it in memory and do stuff with it.
shajin = Person.new()
In this snippet, we have a variable shajin, which points to an object (an instance of the person class). The variable is simply the identifier for an object, but is not the object itself.
I think it was a trick question. Ultimately object orientation is feature for humans to understand complex programs, but computers are not object oriented themselves. Drill down enough layers and objects cease to exist in any language.
So perhaps it's more fair to say: "In ruby, everything important is an object".
Why not go directly to the source? The Ruby Language Specification couldn't be more clear and obvious (emphasis added by me):
6.2 Variables
6.2.1 General description
A variable is denoted by a name, and refers to an object, which is called the value of the variable.
A variable itself is not an object.
http://www.techotopia.com/index.php/Understanding_Ruby_Variables
"A variable in Ruby is just a label for a container.
A variable could contain almost anything - a string, an array, a hash.
A variable name may only contain lowercase letters, numbers, and underscores.
A variable name should ideally make sense in the context of your program."
"We'll begin with the fact that Ruby is a completelyobject-orientated language. Every value is an object (...)."(The Ruby Programming Language, Flanagan & Matsumoto, page 2).
Note this book, co-authored by the language creator, does not state "everything is an object".
a = 1
1 is an object, 'a' is a reference to the 1 object. If 'a' was an object on it's own, it would have an object_id of it's own. But:
1.object_id #=> 3
a.object_id #=> 3
Also, methods are not really objects (but you can turn them into objects if needed).
#Alex Wayne and #Jörg W Mittag answears are correct, but I would like to add that "not everything" important is an object. Like method and block are not objects, but can be converted to objects, with method method and proc respectively.

Why use symbols as hash keys in Ruby?

A lot of times people use symbols as keys in a Ruby hash.
What's the advantage over using a string?
E.g.:
hash[:name]
vs.
hash['name']
TL;DR:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Ruby Symbols are immutable (can't be changed), which makes looking something up much easier
Short(ish) answer:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.
a = 'name'
a.object_id
=> 557720
b = 'name'
=> 557740
'name'.object_id
=> 1373460
'name'.object_id
=> 1373480 # !! different entity from the one above
# Ruby assumes any string can change at any point in time,
# therefore treating it as a separate entity
# versus:
:name.object_id
=> 71068
:name.object_id
=> 71068
# the symbol :name is a references to the same unique entity
Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).
If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.
If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)
Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter (e.g. Ruby can run out of memory and blow up if you generate too many symbols programmatically).
Notes:
If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.
If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.
Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.
Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.
Long answers:
https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings
http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/
https://www.rubyguides.com/2016/01/ruby-mutability/
The reason is efficiency, with multiple gains over a String:
Symbols are immutable, so the question "what happens if the key changes?" doesn't need to be asked.
Strings are duplicated in your code and will typically take more space in memory.
Hash lookups must compute the hash of the keys to compare them. This is O(n) for Strings and constant for Symbols.
Moreover, Ruby 1.9 introduced a simplified syntax just for hash with symbols keys (e.g. h.merge(foo: 42, bar: 6)), and Ruby 2.0 has keyword arguments that work only for symbol keys.
Notes:
1) You might be surprised to learn that Ruby treats String keys differently than any other type. Indeed:
s = "foo"
h = {}
h[s] = "bar"
s.upcase!
h.rehash # must be called whenever a key changes!
h[s] # => nil, not "bar"
h.keys
h.keys.first.upcase! # => TypeError: can't modify frozen string
For string keys only, Ruby will use a frozen copy instead of the object itself.
2) The letters "b", "a", and "r" are stored only once for all occurrences of :bar in a program. Before Ruby 2.2, it was a bad idea to constantly create new Symbols that were never reused, as they would remain in the global Symbol lookup table forever. Ruby 2.2 will garbage collect them, so no worries.
3) Actually, computing the hash for a Symbol didn't take any time in Ruby 1.8.x, as the object ID was used directly:
:bar.object_id == :bar.hash # => true in Ruby 1.8.7
In Ruby 1.9.x, this has changed as hashes change from one session to another (including those of Symbols):
:bar.hash # => some number that will be different next time Ruby 1.9 is ran
Re: what's the advantage over using a string?
Styling: its the Ruby-way
(Very) slightly faster value look ups since hashing a symbol is equivalent to hashing an integer vs hashing a string.
Disadvantage: consumes a slot in the program's symbol table that is never released.
I'd be very interested in a follow-up regarding frozen strings introduced in Ruby 2.x.
When you deal with numerous strings coming from a text input (I'm thinking of HTTP params or payload, through Rack, for example), it's way easier to use strings everywhere.
When you deal with dozens of them but they never change (if they're your business "vocabulary"), I like to think that freezing them can make a difference. I haven't done any benchmark yet, but I guess it would be close the symbols performance.

Resources