Different behavior of strings and symbols? - ruby

I was learning ruby recently from koans and i noticed one thing about symbols and string objects. When i assigned two different variables same symbols, i found that the object_id's were same.
2.1.1 :017 > symbol1 = :a
=> :a
2.1.1 :018 > symbol2 = :a
=> :a
2.1.1 :019 > symbol1.object_id
=> 361768
2.1.1 :020 > symbol2.object_id
=> 361768
Now seeing this i thought that it should be true strings and integers too. But when i did same with strings the object id's ended up being different.
2.1.1 :021 > string1 = "test"
=> "test"
2.1.1 :022 > string2 = "test"
=> "test"
2.1.1 :023 > string1.object_id
=> 13977640
2.1.1 :024 > string2.object_id
=> 13932280
Why is the behavior of symbols and strings different?

You can think of symbols as self-referential interned strings - that is, only one copy of a given symbol will ever exist. This is also true of some objects like Fixnum instances, booleans, or nil, as well. They are not garbage collected, are not duplicable, and are not mutable.
Strings, on the other hand, are garbage collected, are duplicable, and are mutable. Every time you declare a string, a new object is allocated.

Related

Why is a hash literal called a hash literal in Ruby?

This is probably something you learn in programming 101.
Disclaimer: I have no formal programming training. I am self-taught.
For me, literal hash is like what this website suggests: a third editable hash called "corned beef hash".
In Ruby, you have two data types:
hash
hash literals
Why is one called a literal? Is it because you literally type out the associative array? The website above claims it is because the definition is inline. If so, why is the hash not also called literal when you can type it out like this:
states = Hash.new
states["CA"] = "California"
states["MA"] = "Massachusetts"
states["NY"] = "New York"
states["MA"].reverse #=> "sttesuhcassaM"
The data type is just one: Hash. Hash is a class. You can instantiate objects and use them
h = Hash.new
h.store("CA", "California")
h["MA"] = "Massachusetts"
A literal is just a shortcut which let you create objects of that class without explicitly use that class.
h = { "CA" => "California", "MA" => "Massachusetts" }
Same for Arrays
a = Array.new
a.push(1)
a << 2
Or, with array literal
a = [1, 2]
Your confusion stems from this misconception:
In Ruby, you have two data types:
hash
hash literals
Firstly, there are many more data structures in the Ruby core.
But secondly, there is no such thing as "literal hash". Hash literals refer to syntax sugar for defining hashes in place, aka literally.
# This is a hash literal
x = {foo: 42, bar: :baz}
# This is not a hash literal
x = Hash.new
x[:foo] = 42
x[:bar] = :baz
They are completely identical. The only difference is one is more convenient, while the other is more dynamic.
A literal is a fixed value.
It cannot be edited, unless you assign it to a variable and then modify that (although then of course you are not actually modifying the literal).
https://en.wikipedia.org/wiki/Literal_(computer_programming)
So you can assign a literal to a variable, compare a variable to a literal, compare two literals, but you cannot in general modify a literal directly.
Edit: Note that cases where a literal is modified turn out to be creating a new object, unlike the same operation performed on a variable.
2.2.5 :001 > "foo".upcase!
=> "FOO"
2.2.5 :002 > "foo".object_id
=> 2204993280
2.2.5 :003 > "foo".upcase!.object_id
=> 2204964760
2.2.5 :004 > x = "foo"
=> "foo"
2.2.5 :005 > x.object_id
=> 2204927520
2.2.5 :006 > x.upcase!.object_id
=> 2204927520
2.2.5 :007 >

Do you need to overwrite hash and eql? when overwriting == operator in Ruby?

In languages like Java and C#, if you are override equality operators, you must override the hash method as well.
Whenever a.equals(b), then a.hashCode() must be same as b.hashCode()
As far I understand, some internal data structures in these languages rely on the condition above to hold true in order to function correctly.
I wonder if the same is true in Ruby. Do you need to override hash method of the object when overriding == operator? I heard that you need to override the eql? when overriding ==. What are the reasons behind those claims, and what would happen if you won't override those?
No, you don't need to override eql? and hash methods.
However, as tadman mentioned, you should override them. You don't know how eql? might be used, and if you don't override hash then you will get strange results if you use the object as a hash key. See this blog post.
Having said all that, you brought up an interesting point:
In Java and C#, you must override the hash method as well.
What happens if you don't override the hash method? Will it fail to compile, or is it a poor practice?
It feels like in Ruby there are very few hard and fast rules like this. I wonder if Ruby has a different paradigm compared to languages like C#, Java and C++. Perhaps the paradigm is different because Ruby is duck typed and does not have a separate compile process.
Ruby has 3 equality methods ==, eql? and equal?. At the base Object class they all do the same, but for the more specific classes they provide class-specific semantics.
What they compare is dependant on the developer who implemented the class, but nevertheless, there is a convention.
== — Value comparison
True when two objects have the same value.
2.2.3 :011 > 5 == 5.0
=> true
2.2.3 :012 > 'test' == 'test'
=> true
2.2.3 :013 > { a: 10 } == { a: 10.0 }
=> true
2.2.3 :014 > :test == :test
=> true
2.2.3 :016 > ['a', :test, 10] == ['a', :test, 10.0]
=> true
 eql? — Value and type comparison
True when two objects have the same value and type
2.2.3 :028 > 'test'.eql? 'test' # Strings
=> true
2.2.3 :029 > 5.eql? 5 # Fixnums
=> true
2.2.3 :030 > 5.eql? 5.0 # Fixnum & Float
=> false
2.2.3 :032 > { a: 10 }.eql?({ a: 10 }) # Hash
=> true
2.2.3 :033 > { a: 10 }.eql?({ a: 10.0 })
=> false
equal? — Reference comparison
True when two objects share the same memory reference. This method should never be overridden.
2.2.3 :017 > 'test'.equal? 'test'
=> false
# Each string is an independent object even if they share content
2.2.3 :018 > :test.equal? :test
=> true
# Symbols share reference if they have the same content
2.2.3 :019 > 1.equal? 1
=> true
2.2.3 :020 > [].equal? []
=> false
2.2.3 :021 > a = 'test'
=> "test"
2.2.3 :022 > b = a # b is a reference to the same object as a
=> "test"
2.2.3 :023 > b.equal? a
=> true
If you are just overriding the == you don't need to override the hash method, but you should, in case the eql? or equal? methods get called.

Encoding and decoding ruby symbols

I discovered this behavior of multi_json ruby gem:
2.1.0 :001 > require 'multi_json'
=> true
2.1.0 :002 > sym = :symbol
=> :symbol
2.1.0 :003 > sym.class
=> Symbol
2.1.0 :004 > res = MultiJson.load MultiJson.dump(sym)
=> "symbol"
2.1.0 :005 > res.class
=> String
Is this an appropriate way to store ruby symbols? Does JSON provide some way to distinguish :symbol from "string"?
Nope is the simple answer. Most of the time it only really matters for hashes and there's a cheat on hashes, symbolize_keys!. Bottom line is that JSON does not understand symbols, just strings.
Since you are using MultiJson, you can also ask MultiJson to do this for you...
MultiJson.load('{"abc":"def"}', :symbolize_keys => true)

IRB (apparently) not inspecting hashes correctly

I'm seeing some odd behavior in IRB 1.8.7 with printing hashes. If I initialize my hash with a Hash.new, it appears that my hash is "evaluating" to an empty hash:
irb(main):024:0> h = Hash.new([])
=> {}
irb(main):025:0> h["test"]
=> []
irb(main):026:0> h["test"] << "blah"
=> ["blah"]
irb(main):027:0> h
=> {}
irb(main):028:0> puts h.inspect
{}
=> nil
irb(main):031:0> require 'pp'
=> true
irb(main):032:0> pp h
{}
=> nil
irb(main):033:0> h["test"]
=> ["blah"]
As you can see, the data is actually present in the hash, but trying to print or display it seems to fail. Initialization with a hash literal seems to fix this problem:
irb(main):050:0> hash = { 'test' => ['testval'] }
=> {"test"=>["testval"]}
irb(main):051:0> hash
=> {"test"=>["testval"]}
irb(main):053:0> hash['othertest'] = ['secondval']
=> ["secondval"]
irb(main):054:0> hash
=> {"othertest"=>["secondval"], "test"=>["testval"]}
The issue here is that invoking h["test"] doesn't actually insert a new key into the hash - it just returns the default value, which is the array that you passed to Hash.new.
1.8.7 :010 > a = []
=> []
1.8.7 :011 > a.object_id
=> 70338238506580
1.8.7 :012 > h = Hash.new(a)
=> {}
1.8.7 :013 > h["test"].object_id
=> 70338238506580
1.8.7 :014 > h["test"] << "blah"
=> ["blah"]
1.8.7 :015 > h.keys
=> []
1.8.7 :016 > h["bogus"]
=> ["blah"]
1.8.7 :017 > h["bogus"].object_id
=> 70338238506580
1.8.7 :019 > a
=> ["blah"]
The hash itself is still empty - you haven't assigned anything to it. The data isn't present in the hash - it's present in the array that is returned for missing keys in the hash.
It looks like you're trying to create a hash of arrays. To do so, I recommend you initialize like so:
h = Hash.new { |h,k| h[k] = [] }
Your version isn't working correctly for me, either. The reason why is a little complicated to understand. From the docs:
If obj is specified, this single object will be used for all default values.
I've added the bolding. The rest of the emphasis is as-is.
You're specifying that obj is [], and it's only a default value. It doesn't actually set the contents of the hash to that default value. So when you do h["blah"] << "test", you're really just asking it to return a copy of the default value and then adding "test" to that copy. It never goes into the hash at all. (I need to give Chris Heald credit for explaining this below.)
If instead you give it a block, it calls that block EVERY TIME you do a lookup on a non-existent entry of the hash. So you're not just creating one Array anymore. You're creating one for each entry of the hash.

How are named capture groups used in RE2 regexps?

On this page http://swtch.com/~rsc/regexp/regexp3.html it says that RE2 supports named expressions.
RE2 supports Python-style named captures (?P<name>expr), but not the
alternate syntaxes (?<name>expr) and (?'name'expr) used by .NET and
Perl.
ruby-1.9.2-p180 :003 > r = RE2::Regexp.compile("(?P<foo>.+) bla")
#=> #<RE2::Regexp /(?P<foo>.+) bla/>
ruby-1.9.2-p180 :006 > r = r.match("lalal bla")
#=> #<RE2::MatchData "lalal bla" 1:"lalal">
ruby-1.9.2-p180 :009 > r[1] #=> "lalal"
ruby-1.9.2-p180 :010 > r[:foo]
TypeError: can't convert Symbol into Integer
ruby-1.9.2-p180 :011 > r["foo"]
TypeError: can't convert String into Integer
But I'm not able to access the match with the name, so it seems like a useless implementation. Am I missing something?
Looking at your code output, it seems that you are using the Ruby re2 gem which I maintain.
As of the latest release (0.2.0), the gem does not support the underlying C++ re2 library's named capturing groups. The error you are seeing is due to the fact that any non-integer argument passed to MatchData#[] will simply be forwarded onto the default Array#[]. You can confirm this in an irb session like so:
irb(main):001:0> a = [1, 2, 3]
=> [1, 2, 3]
irb(main):002:0> a["bob"]
TypeError: can't convert String into Integer
from (irb):2:in `[]'
from (irb):2
from /Users/mudge/.rbenv/versions/1.9.2-p290/bin/irb:12:in `<main>'
irb(main):003:0> a[:bob]
TypeError: can't convert Symbol into Integer
from (irb):3:in `[]'
from (irb):3
from /Users/mudge/.rbenv/versions/1.9.2-p290/bin/irb:12:in `<main>'
I will endeavour to add the ability to reference captures by name as soon as possible and update this answer once a release has been made.
Update: I just released version 0.3.0 which now supports named groups like so:
irb(main):001:0> r = RE2::Regexp.compile("(?P<foo>.+) bla")
=> #<RE2::Regexp /(?P<foo>.+) bla/>
irb(main):002:0> r = r.match("lalal bla")
=> #<RE2::MatchData "lalal bla" 1:"lalal">
irb(main):003:0> r[1]
=> "lalal"
irb(main):004:0> r[:foo]
=> "lalal"
irb(main):005:0> r["foo"]
=> "lalal"

Resources