Why is :key.hash != 'key'.hash in Ruby? - ruby

I'm learning Ruby right now for the Rhodes mobile application framework and came across this problem: Rhodes' HTTP client parses JSON responses into Ruby data structures, e.g.
puts #params # prints {"body"=>{"results"=>[]}}
Since the key "body" is a string here, my first attempt #params[:body] failed (is nil) and instead it must be #params['body']. I find this most unfortunate.
Can somebody explain the rationale why strings and symbols have different hashes, i.e. :body.hash != 'body'.hash in this case?

Symbols and strings serve two different purposes.
Strings are your good old familiar friends: mutable and garbage-collectable. Every time you use a string literal or #to_s method, a new string is created. You use strings to build HTML markup, output text to screen and whatnot.
Symbols, on the other hand, are different. Each symbol exists only in one instance and it exists always (i.e, it is not garbage-collected). Because of that you should make new symbols very carefully (String#to_sym and :'' literal). These properties make them a good candidate for naming things. For example, it's idiomatic to use symbols in macros like attr_reader :foo.
If you got your hash from an external source (you deserialized a JSON response, for example) and you want to use symbols to access its elements, then you can either use HashWithIndifferentAccess (as others pointed out), or call helper methods from ActiveSupport:
require 'active_support/core_ext'
h = {"body"=>{"results"=>[]}}
h.symbolize_keys # => {:body=>{"results"=>[]}}
h.stringify_keys # => {"body"=>{"results"=>[]}}
Note that it'll only touch top level and will not go into child hashes.

Symbols and Strings are never ==:
:foo == 'foo' # => false
That's a (very reasonable) design decision. After all, they have different classes, methods, one is mutable the other isn't, etc...
Because of that, it is mandatory that they are never eql?:
:foo.eql? 'foo' # => false
Two objects that are not eql? typically don't have the same hash, but even if they did, the Hash lookup uses hash and then eql?. So your question really was "why are symbols and strings not eql?".
Rails uses HashWithIndifferentAccess that accesses indifferently with strings or symbols.

In Rails, the params hash is actually a HashWithIndifferentAccess rather than a standard ruby Hash object. This allows you to use either strings like 'action' or symbols like :action to access the contents.
You will get the same results regardless of what you use, but keep in mind this only works on HashWithIndifferentAccess objects.
Copied from : Params hash keys as symbols vs strings

Related

What is the difference between being immutable and the fact that there can only be one instance of a Symbol?

I'm reading Eloquent Ruby, and am on Chapter 6 on Symbols. Some excerpts:
"There can only ever be one instance of any given symbol. If I mention :all twice in my code, it is always the same :all."
a = :all
b = :all
puts a.object_id, b.object_id # same objects
"Another aspect of symbols that makes them so well suited to their chosen career is that symbols are immutable - once you create that :all symbol, it will be :all until the end of time (or at least until your Ruby interpreter exits)"
What is the difference between being immutable and the fact that there can only be one instance of you?
By the way, I would like to write the previous sentence more accurately: "What is the difference between a class being immutable and the fact that there can only be one instance of the class?" Is class the right word to insert there?
How would you even go about trying to mutate a symbol, they don't seem to hold values like other variables?
Immutable means that an object cannot be changed. In Ruby, symbols are immutable. To make a symbol mutable, you have to perform type conversion to a string, which is mutable.
a = :mystring
a = a.to_s
=> "mystring"
For proof that a symbol is immutable, you can call the frozen? property on it.
a.frozen?
=> true
Note that symbols cannot be unfrozen unlike strings which have an unfreeze method.
For object ids
In Ruby, the object_id of an object is the same as the VALUE that represents the object on the C level. For most objects, this points to a location in memory where the object data is stored. This varies over time because it depends on where the system decided to allocate its memory.
Symbols have the same object id because they are meant to represent a SINGLE value.
To check this out, let's type to the console the same symbol multiple times.
:z.object_id
=> 636328
:z.object_id
=> 636328
:z.object_id
=> 636328
Now, let's try the same thing only with strings
"z".object_id
=> 21237740
"z".object_id
=> 24355380
As you can see, here we have two references to the string z, both of which are different objects. Thus, they have different object_ids.
This also means that symbols can save quite a bit of memory, especially if we are dealing with big data. Because symbols are the same object, it's faster to compare them then it is strings. Strings require comparing the values instead of the object ids.
Your sentence is fine; you're not sure of the common phrase used to describe a class with only one instance. I'll explain that as I go along.
An object that is immutable cannot change through any operations done on it. This means that any operation that would change a symbol would generate a new one instead.
:foo.object_id # 1520028
:foo.upcase.object_id # 70209716662240
:foo.capitalize.object_id # 70209719120060
You can certainly write objects that are immutable, or make them immutable (with some caveats) via freeze, but you can always create a new instance of them.
f = "foo"
f.freeze
f1 = "foo"
puts f.object_id == f1.object_id # false
An object that only ever has one instance of itself is considered to be a singleton.
If there's only one instance of it, then you only store it in memory once.
If you attempt to create it, you only get the previously existing object back.

ruby symbol as key, but can't get value from hash

I'm doing some update on other one's code and now I have a hash, it's like:
{"instance_id"=>"74563c459c457b2288568ec0a7779f62", "mem_quota"=>536870912, "disk_quota"=>2147483648, "mem_usage"=>59164.0, "cpu_usage"=>0.1, "disk_usage"=>6336512}
and I want to get the value by symbol as a key, for example: :mem_quota, but failed.
The code is like:
instance[:mem_usage].to_f
but it returns nothing. Is there any reason can cause this problem?
Use instance["mem_usage"] instead since the hash is not using symbols.
The other explanations are correct, but to give a broader background:
You are probably used to working within Rails where a very specific variant of Hash, called HashWithIndifferentAccess, is used for things like params. This particular class works like a standard ruby Hash, except when you access keys you are allowed to use either Symbols or Strings. The standard Ruby Hash, and generally speaking, Hash implementations in other languages, expect that to access an element, the key used for later access should be an object of the same class and value as the key used to store the object. HashWithIndifferentAccess is a Rails convenience class provided via the Active Support libraries. You are free to use them yourself, but they have first be brought in by requiring them.
HashWithIndifferentAccess just does the conversion for you at access time from string to symbol.
So, for your case, instance["mem_usage"].to_f should work.
You need HashWithIndifferentAccess.
require 'active_support/core_ext'
h1 = {"instance_id"=>"74563c459c457b2288568ec0a7779f62", "mem_quota"=>536870912,
"disk_quota"=>2147483648, "mem_usage"=>59164.0, "cpu_usage"=>0.1,
"disk_usage"=>6336512}
h2 = h1.with_indifferent_access
h1[:mem_usage] # => nil
h1["mem_usage"] # => 59164.0
h2[:mem_usage] # => 59164.0
h2["mem_usage"] # => 59164.0
Also, there are the symbolize_keys and stringify_keys options that may be of help. The method names are self-descriptive enough, I believe.
Clearly the keys of your hash are strings because they have double quotes around them. Therefore you will need to access the keys with instance["mem_usage"] or you will need to build a new hash with symbols as the keys first.
If you use Rails with ActiveSupport, then do use HashWithIndifferentAccess for flexibility in accessing hash with either string or symbol.
hash = HashWithIndifferentAccess.new({
"instance_id"=>"74563c459c457b2288568ec0a7779f62",
"mem_quota"=>536870912, "disk_quota"=>2147483648,
"mem_usage"=>59164.0,
"cpu_usage"=>0.1,
"disk_usage"=>6336512
})
hash[:mem_usage] # => 59164.0
hash["mem_usage"] # => 59164.0

Why use symbols as hash keys in Ruby?

A lot of times people use symbols as keys in a Ruby hash.
What's the advantage over using a string?
E.g.:
hash[:name]
vs.
hash['name']
TL;DR:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Ruby Symbols are immutable (can't be changed), which makes looking something up much easier
Short(ish) answer:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.
a = 'name'
a.object_id
=> 557720
b = 'name'
=> 557740
'name'.object_id
=> 1373460
'name'.object_id
=> 1373480 # !! different entity from the one above
# Ruby assumes any string can change at any point in time,
# therefore treating it as a separate entity
# versus:
:name.object_id
=> 71068
:name.object_id
=> 71068
# the symbol :name is a references to the same unique entity
Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).
If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.
If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)
Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter (e.g. Ruby can run out of memory and blow up if you generate too many symbols programmatically).
Notes:
If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.
If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.
Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.
Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.
Long answers:
https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings
http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/
https://www.rubyguides.com/2016/01/ruby-mutability/
The reason is efficiency, with multiple gains over a String:
Symbols are immutable, so the question "what happens if the key changes?" doesn't need to be asked.
Strings are duplicated in your code and will typically take more space in memory.
Hash lookups must compute the hash of the keys to compare them. This is O(n) for Strings and constant for Symbols.
Moreover, Ruby 1.9 introduced a simplified syntax just for hash with symbols keys (e.g. h.merge(foo: 42, bar: 6)), and Ruby 2.0 has keyword arguments that work only for symbol keys.
Notes:
1) You might be surprised to learn that Ruby treats String keys differently than any other type. Indeed:
s = "foo"
h = {}
h[s] = "bar"
s.upcase!
h.rehash # must be called whenever a key changes!
h[s] # => nil, not "bar"
h.keys
h.keys.first.upcase! # => TypeError: can't modify frozen string
For string keys only, Ruby will use a frozen copy instead of the object itself.
2) The letters "b", "a", and "r" are stored only once for all occurrences of :bar in a program. Before Ruby 2.2, it was a bad idea to constantly create new Symbols that were never reused, as they would remain in the global Symbol lookup table forever. Ruby 2.2 will garbage collect them, so no worries.
3) Actually, computing the hash for a Symbol didn't take any time in Ruby 1.8.x, as the object ID was used directly:
:bar.object_id == :bar.hash # => true in Ruby 1.8.7
In Ruby 1.9.x, this has changed as hashes change from one session to another (including those of Symbols):
:bar.hash # => some number that will be different next time Ruby 1.9 is ran
Re: what's the advantage over using a string?
Styling: its the Ruby-way
(Very) slightly faster value look ups since hashing a symbol is equivalent to hashing an integer vs hashing a string.
Disadvantage: consumes a slot in the program's symbol table that is never released.
I'd be very interested in a follow-up regarding frozen strings introduced in Ruby 2.x.
When you deal with numerous strings coming from a text input (I'm thinking of HTTP params or payload, through Rack, for example), it's way easier to use strings everywhere.
When you deal with dozens of them but they never change (if they're your business "vocabulary"), I like to think that freezing them can make a difference. I haven't done any benchmark yet, but I guess it would be close the symbols performance.

What does :this means in Ruby on Rails?

I'm new to the Ruby and Ruby on Rails world. I've read some guides, but i've some trouble with the following syntax.
I think that the usage of :condition syntax is used in Ruby to define a class attribute with some kind of accessor, like:
class Sample
attr_accessor :condition
end
that implicitly declares the getter and setter for the "condition" property.
While i was looking at some Rails sample code, i found the following examples that i don't fully understand.
For example:
#post = Post.find(params[:id])
Why it's accessing the id attribute with this syntax, instead of:
#post = Post.find(params[id])
Or, for example:
#posts = Post.find(:all)
Is :all a constant here? If not, what does this code really means? If yes, why the following is not used:
#posts = Post.find(ALL)
Thanks
A colon before text indicates a symbol in Ruby. A symbol is kind of like a constant, but it's almost as though a symbol receives a unique value (that you don't care about) as its constant value.
When used as a hash index, symbols are almost (but not exactly) the same as using strings.
Also, you can read "all" from :all by calling to_s on the symbol. If you had a constant variable ALL, there would be no way to determine that it meant "all" other than looking up its value. This is also why you can use symbols as arguments to meta-methods like attr_accessor, attr_reader, and the like.
You might want to read up on Ruby symbols.
:all is a symbol. Symbols are Ruby's version of interned strings. You can think of it like this: There is an invisible global table called symbols which has String keys and Fixnum values. Any string can be converted into a symbol by calling .to_sym, which looks for the string in the table. If the string is already in the table, it returns the the Fixnum, otherwise, it enters it into the table and returns the next Fixnum. Because of this, symbols are treated at run-time like Fixnums: comparison time is constant (in C parlance, comparisons of symbols can be done with == instead of strcmp)
You can verify this by looking at the object_id of objects; when two thing's object_ids are the same, they're both pointing at the same object.
You can see that you can convert two strings to symbols, and they'll both have the same object id:
"all".to_sym.object_id == "all".to_sym.object_id #=> true
"all".to_sym.object_id == :all.object_id #=> true
But the converse is not true: (each call to Symbol#to_s will produce a brand new string)
:all.to_s.object_id == :all.to_s.object_id #=> false
Don't look at symbols as a way of saving memory. Look at them as indicating that the string ought to be immutable. 13 Ways of Looking at a Ruby Symbol gives a variety of ways of looking at a symbol.
To use a metaphor: symbols are for multiple-choice tests, strings are for essay questions.
This has nothing to do with Rails, it's just Ruby's Symbols. :all is a symbol which is effectively just a basic string.

Why don't more projects use Ruby Symbols instead of Strings?

When I first started reading about and learning ruby, I read something about the power of ruby symbols over strings: symbols are stored in memory only once, while strings are stored in memory once per string, even if they are the same.
For instance: Rails' params Hash in the Controller has a bunch of keys as symbols:
params[:id] or
params[:title]...
But other decently sized projects such as Sinatra and Jekyll don't do that:
Jekyll:
post.data["title"] or
post.data["tags"]...
Sinatra:
params["id"] or
params["title"]...
This makes reading new code a little tricky, and makes it hard to transfer code around and to figure out why using symbols isn't working. There are many more examples of this and it's kind of confusing. Should we or shouldn't we be using symbols in this case? What are the advantages of symbols and should we be using them here?
In ruby, after creating the AST, each symbol is represented as a unique integer. Having symbols as hash keys makes the computing a lot faster, as the main operation is comparison.
Symbols are not garbage collected AFAIK, so that might be a thing to watch out for, but except for that they really are great as hash keys.
One reason for the usage of strings may be the usage of yaml to define the values.
require 'yaml'
data = YAML.load(<<-data
one:
title: one
tag: 1
two:
title: two
tag: 2
data
) #-> {"one"=>{"title"=>"one", "tag"=>1}, "two"=>{"title"=>"two", "tag"=>2}}
You may use yaml to define symbol-keys:
require 'yaml'
data = YAML.load(<<-data
:one:
:title: one
:tag: 1
:two:
:title: two
:tag: 2
data
) #-> {:one=>{:title=>"one", :tag=>1}, :two=>{:title=>"two", :tag=>2}}
But in the yaml-definition symbols look a bit strange, strings looks more natural.
Another reason for strings as keys: Depending on the use case, it can be reasonable to sort by keys, but you can't sort symbols (at least not without a conversion to strings).
The main difference is that multiple symbols representing a single value are identical whereas this is not true with strings. For example:
irb(main):007:0> :test.object_id
=> 83618
irb(main):008:0> :test.object_id
=> 83618
irb(main):009:0> :test.object_id
=> 83618
3 references to the symbol :test, all the same object.
irb(main):010:0> "test".object_id
=> -605770378
irb(main):011:0> "test".object_id
=> -605779298
irb(main):012:0> "test".object_id
=> -605784948
3 references to the string "test", all different objects.
This means that using symbols can potentially save a good bit of memory depending on the application. It is also faster to compare symbols for equality since they are the same object, comparing identical strings is much slower since the string values need to be compared instead of just the object ids.
I usually use strings for almost everything except things like hash keys where I really want a unique identifier, not a string

Resources