Converting Ruby symbols to integers - ruby

I find it surprising that Ruby symbols can be typecasted to integers without errors.
So :a.to_i is legal. I was wondering what is the significance of this integer, is it a unique value specific to that symbol?

You shouldn't do this, as Symbol#to_i was removed in Ruby 1.9 and is thus not compatible going forward. Regardless, the docs say this about it:
Returns an integer that is unique for each symbol within a particular execution of a program.
It is roughly equivalent to calling object_id on the symbol, as they both end up calling the C function SYM2ID().

In 1.8.x symbols were immediate objects. Their implementation was fast, and, most of the time, small. But with that came a security concern regarding the lack of garbage collection.
The #to_i and #to_int methods returned a unique integer and were related to the internal implementation.
Symbols-as-immediates and the implicit and explicit integer conversions have all been removed in 1.9.x. You can of course get the object_id. It's interesting that in 1.8.x to_i and object_id did not return the same number.

Related

Freezing string literals in early Ruby versions

This question applies in particular to Ruby 1.9 and 2.1, where String literals can not be frozen automatically. In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time, which among other advantages is said to make the program perform better. As a concrete example, this article proposes the expression
("%09d".freeze % id).scan(/\d{3}/).join("/".freeze)
I want to use this concept in our project, and for testing purpose, I tried the following code:
3.times { x="abc".freeze; puts x.object_id }
In Ruby 2.3, this prints the same object ID every time. In JRuby 1.7, which corresponds on the language level to Ruby 1.9, it prints three different object IDs, although I have explicitly frozen the string.
Could somebody explain the reason for this, and how to use freeze properly in this situation?
In particular I am refering to this article, which suggests to freeze strings, so that repeated evaluation of the code does not create a new String object every time
That is not what Object#freeze does. As the name implies, it "freezes" the object, i.e. it disallows any further modification to the object's internal state. There is nothing in the documentation that even remotely suggests that Object#freeze performs some sort of de-duplication or interning.
You may be thinking of String#-#, but this does not exist in Ruby 2.1. It was only added in Ruby 2.3, and actually had different semantics then:
Ruby 2.3–2.4: returns self if self is already frozen, otherwise returns self.dup.freeze, i.e. a frozen duplicate of the string:
-str → str (frozen)
If the string is frozen, then return the string itself.
If the string is not frozen, then duplicate the string freeze it and return it.
Ruby 2.5+: returns self if self is already frozen, otherwise returns a frozen version of the string that is de-duplicated (i.e. it may be looked up in a cache of existing frozen strings and the existing version returned):
-str → str (frozen)
Returns a frozen, possibly pre-existing copy of the string.
The string will be deduplicated as long as it is not tainted, or has any instance variables set on it.
So, the article you linked to is wrong on three counts:
De-duplication is only performed for strings, not arbitrary objects.
De-duplication is not performed by freeze.
De-duplication is only performed by String#-# starting in Ruby 2.5.
There is also a fourth claim that is wrong in that article, although we can't really blame the author for that since the article is from 2016 and the decision was only changed in 2019: Ruby 3.0 will not have immutable string literals by default.
The one thing that is correct in that article is that the # frozen_string_literal: true pragma (or the corresponding command line option --enable-frozen-string-literal) will not only freeze all static string literals, it will also de-duplicate them.

Ruby object to_s what is the encoding of the object id?

In Ruby, the to_s on an object includes an encoding of the object's id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
In the documentation it says
Returns a string representing obj. The default to_s prints the object’s class and an encoding of the object id.
https://apidock.com/ruby/Object/to_s
In the example above, the encoding of the object id is 0x00007fac5eb6afc8.
In How does object_id assignment work? they explain
In MRI the object_id of an object is the same as the VALUE that represents the object on the C level.
So I compared to the object_id and it is not the same as the encoding of the object id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
[3] pry(main)> shape.object_id
=> 70189150066660
What exactly is the encoding of the object id? It does not appear to be the object_id.
Think of the object_id, or __id__ as the "pointer" for the object. It is not technically a pointer, but does contain a unique value that can be used to retrieve the internal C VALUE.
There are patterns to the value it has for some data types, as you can see with its hexadecimal representation with to_s. I am will not go into all the details, as there are already numerous answers on SO explaining, and already linked from comments, but integers (up to a FIXNUM_MAX, have predictable values, and special constants like true, false, and nil will always have the same object_id in every run.
To put simply, it is nothing more than a number, and shown as a hexadecimal (base 16) value, not any actual "encoding" or cypher.
Going to expand upon this a bit more in light of your latest edits to the question. As you posted, the hexadecimal number you see in to_s is the value of the internal C VALUE of the object. VALUE is a C data type (unsigned, pointer size number) that every Ruby object is represented as in C code. As #Stefan pointed out in a comment, for non-integer types (I speak only for MRI version), it is twice the value of the object_id. Not that you probably care, but you can shift the bits of an integer to predict the value for those.
Therefore, using you example.
A value of 0x00007fac5eb6afc8 is simple hexadecimal notation for a number. It uses a base 16 counting system as opposed to the base 10 decimal system we are more used to in everyday life. It is simply a different way of looking at the same number.
So, using that logic.
a = 0x00007fac5eb6afc8
#=> 140378300133320 # Decimal representation
a /= 2 # Remember, non-integers are half of this value
#=> 70189150066660 # Your object_id
The best answer you can get is: You don't know, and you shouldn't need to.
Ruby guarantees exactly three things about object IDs:
An object has the same ID during its lifetime.
No two objects have the same ID at the same time.
IDs are integers.
In particular, this means that you cannot rely on a specific object having a specific ID (for example, nil having ID 8). It also means that IDs can be re-used. You should think of it as nothing but opaque identifier.
And, as you quoted, the default Object#to_s uses "some" encoding of the ID.
And that is all you know, and all you should ever rely on. In particular, you should never try to parse IDs or Object#to_s.
So, the ID part of Object#to_s is "some unspecified encoding" of the ID, which itself is "some opaque identifier".
Everything else is deliberately left unspecified, so that different implementations can make different choices that make sense for their specific needs. For example, it would be stupid to tie object IDs to memory addresses, because implementations like JRuby, Opal, IronPython, MagLev, and Topaz run on platforms where the concept of "memory address" doesn't even exist! And Rubinius uses a moving garbage collector, where objects can move around in memory and thus their address changes.

Syntax sugar for variable replacement with `round`

We have do-and-replace functions like map!, reject!, reverse!, rotate!. Also we have binary operations in short form like +=, -=.
Do we have something for mathematical round? We need to use a = a.round, and it's a bit weird for me to repeat the variable name. Do you know how to shorten it?
OK, smart guys have already explained, why there is no syntactic sugar for Float#round. Just out of curiosity I’m gonna show, how you might implement this sugar yourself [partially]. Since Float class has no ~# method defined, and you do rounding quite often, you might monkeypatch Float class:
class Float
def ~#
self.round # self is redundant, left just for clarity
end
end
or, in this simple case, just (credits to #sawa):
alias_method :~#, :round
and now:
~5.2
#⇒ 5
a = 2.45 && ~a
#⇒ 2
Since Numerics are immutable, it’s still impossible to modify it inplace, but the above might save you four keyboard hits per rounding.
As for destructive methods, it is impossible since numerals are immutable, and it would not make sense. Would you want a numeral 5.2 that behaves as 5?
As for syntax sugar, it would be a mess if every single method had one. So there isn't. And since syntax sugar is defined in the core level, you cannot do anything in an ordinary Ruby script to create a new one.
Ruby's numeric types are immutable: they are value objects. Therefore you won't find any methods that mutate a number in place.
Because the numeric types are immutable, certain optimizations are possible that would not be possible with mutable numbers. In c-ruby, for example, a reference, which may point to any kind of object, is normally a pointer to an object. But if the reference is to a Fixnum, then the reference contains the integer itself, rather than pointing to an instance of Fixnum. Ruby does a number of magic tricks to hide this optimization, making it appear that an integer really is an instance of a Fixnum.
To make numbers mutable would make this optimization impossible, so I don't expect that Ruby will ever have mutable numeric types.

Is it possible to redefine 0 in ruby?

I'm not actually going to use this in anything in case it does actually work but is it possible to redefine 0 to act as 1 in Ruby and 1 to act as 0? Where does FixNum actually hold its value?
No, I don't think so. I'd be very suprised if you managed to. If you start overriding Fixnum's methods/operators, you maaaybe might get near that (i.e. override + so that 1+5 => 5, 0+5 => 6 etc), but you will not get full replacement of literal '0' with value 1. At least marshalling to native would expose the real 0 value of the Fixnum(0).
To be honest, I'm not really sure if you can even override the core operations like + op on a Fixnum. That could break so many things..
As far as I remember from 1.8.3 source, simple integers and doubles are held right inside a 'value' and are copied all around *). There is no singular "0", "1" or "1000" value. There is no extra dereference that would allow you to swap all the values with one shot. I doubt it changed in 1.9 and I doubt anyone got any weird idea about that in 2.0. But I don't actually know. Still, that would be strange. No platform I know interns integers and floatings.. Strings, sometimes array literals, but numbers?
So, sorry, no #define true false jokes :)
--
*) clarification from Jörg W Mittag (thanks, this is exactly what I was referring to):
(..) Fixnums do not have a place in memory, their pointer value is "magic" (in that it cannot possibly occur in a Ruby program) and treated specially by the runtime system. Read up on "tagged pointer representation", e.g. here.
Assignment does not alias Fixnum objects. There is effectively only one Fixnum object instance for any given integer value, so, for example, you cannot add a singleton method to a Fixnum. Any attempt to add a singleton method to a Fixnum object will raise a TypeError. Source
That pretty much means you can't edit a Fixnum and therefor not redefine 0 or 1 in native ruby.
Though as these Fixnums are also Objects they have unique object id's that cleary reference them somewhere in the memory. See BasicObject#__id__
If you can locate the memory space where 0 and 1 objects are and switch these, you should have effectivle switched 0 and 1 behavior in ruby as now either will reference the other object.
So to answer your question: No redefining Fixnums is not possible in Ruby, switching their behaviour should be possible though.

Why use symbols as hash keys in Ruby?

A lot of times people use symbols as keys in a Ruby hash.
What's the advantage over using a string?
E.g.:
hash[:name]
vs.
hash['name']
TL;DR:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Ruby Symbols are immutable (can't be changed), which makes looking something up much easier
Short(ish) answer:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.
a = 'name'
a.object_id
=> 557720
b = 'name'
=> 557740
'name'.object_id
=> 1373460
'name'.object_id
=> 1373480 # !! different entity from the one above
# Ruby assumes any string can change at any point in time,
# therefore treating it as a separate entity
# versus:
:name.object_id
=> 71068
:name.object_id
=> 71068
# the symbol :name is a references to the same unique entity
Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).
If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.
If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)
Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter (e.g. Ruby can run out of memory and blow up if you generate too many symbols programmatically).
Notes:
If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.
If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.
Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.
Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.
Long answers:
https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings
http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/
https://www.rubyguides.com/2016/01/ruby-mutability/
The reason is efficiency, with multiple gains over a String:
Symbols are immutable, so the question "what happens if the key changes?" doesn't need to be asked.
Strings are duplicated in your code and will typically take more space in memory.
Hash lookups must compute the hash of the keys to compare them. This is O(n) for Strings and constant for Symbols.
Moreover, Ruby 1.9 introduced a simplified syntax just for hash with symbols keys (e.g. h.merge(foo: 42, bar: 6)), and Ruby 2.0 has keyword arguments that work only for symbol keys.
Notes:
1) You might be surprised to learn that Ruby treats String keys differently than any other type. Indeed:
s = "foo"
h = {}
h[s] = "bar"
s.upcase!
h.rehash # must be called whenever a key changes!
h[s] # => nil, not "bar"
h.keys
h.keys.first.upcase! # => TypeError: can't modify frozen string
For string keys only, Ruby will use a frozen copy instead of the object itself.
2) The letters "b", "a", and "r" are stored only once for all occurrences of :bar in a program. Before Ruby 2.2, it was a bad idea to constantly create new Symbols that were never reused, as they would remain in the global Symbol lookup table forever. Ruby 2.2 will garbage collect them, so no worries.
3) Actually, computing the hash for a Symbol didn't take any time in Ruby 1.8.x, as the object ID was used directly:
:bar.object_id == :bar.hash # => true in Ruby 1.8.7
In Ruby 1.9.x, this has changed as hashes change from one session to another (including those of Symbols):
:bar.hash # => some number that will be different next time Ruby 1.9 is ran
Re: what's the advantage over using a string?
Styling: its the Ruby-way
(Very) slightly faster value look ups since hashing a symbol is equivalent to hashing an integer vs hashing a string.
Disadvantage: consumes a slot in the program's symbol table that is never released.
I'd be very interested in a follow-up regarding frozen strings introduced in Ruby 2.x.
When you deal with numerous strings coming from a text input (I'm thinking of HTTP params or payload, through Rack, for example), it's way easier to use strings everywhere.
When you deal with dozens of them but they never change (if they're your business "vocabulary"), I like to think that freezing them can make a difference. I haven't done any benchmark yet, but I guess it would be close the symbols performance.

Resources