Strings that compare equal don't find same objects in Hash - ruby

I have two strings that appear equal:
context = "Marriott International World’s Most ADMIRED Lodging Company by FORTUNE for 14th yr. via #FortuneMagazine http://cnnmon.ie/1kcFZSQ"
slice_str = context.slice(105,24) # => "http://cnnmon.ie/1kcFZSQ"
str = "http://cnnmon.ie/1kcFZSQ"
slice_str == str # => true
slice_str.eql? str # => true
But when I look up values in a hash where the keys are the strings, they do not return the same thing in Ruby 2.1.0 and Ruby 2.1.1:
redirects = {"http://cnnmon.ie/1kcFZSQ"=>""}
redirects.key?(slice_str) # => false
redirects.key?(str) # => true
What explanation is there for this behaviour? Ruby 1.9.3 works as expected.

This was a bug in ruby < 2.1.3
$ rvm use 2.1.2
Using /Users/richniles/.rvm/gems/ruby-2.1.2
$ irb
2.1.2 :001 > context = "Marriott International World’s Most ADMIRED Lodging Company by FORTUNE for 14th yr. via #FortuneMagazine http://cnnmon.ie/1kcFZSQ"
=> "Marriott International World’s Most ADMIRED Lodging Company by FORTUNE for 14th yr. via #FortuneMagazine http://cnnmon.ie/1kcFZSQ"
2.1.2 :002 > slice_str = context.slice(105,24) # => "http://cnnmon.ie/1kcFZSQ"
=> "http://cnnmon.ie/1kcFZSQ"
2.1.2 :003 > str = "http://cnnmon.ie/1kcFZSQ"
=> "http://cnnmon.ie/1kcFZSQ"
2.1.2 :004 > redirects = {"http://cnnmon.ie/1kcFZSQ"=>""}
=> {"http://cnnmon.ie/1kcFZSQ"=>""}
2.1.2 :005 > redirects.key?(slice_str)
=> false
2.1.2 :006 > redirects.key?(str)
=> true
but do the same in ruby 2.1.3:
$ rvm use 2.1.3
Using /Users/richniles/.rvm/gems/ruby-2.1.3
$ irb
2.1.3 :001 > context = "Marriott International World’s Most ADMIRED Lodging Company by FORTUNE for 14th yr. via #FortuneMagazine http://cnnmon.ie/1kcFZSQ"
=> "Marriott International World’s Most ADMIRED Lodging Company by FORTUNE for 14th yr. via #FortuneMagazine http://cnnmon.ie/1kcFZSQ"
2.1.3 :002 > slice_str = context.slice(105,24) # => "http://cnnmon.ie/1kcFZSQ"
=> "http://cnnmon.ie/1kcFZSQ"
2.1.3 :003 > str = "http://cnnmon.ie/1kcFZSQ"
=> "http://cnnmon.ie/1kcFZSQ"
2.1.3 :004 > redirects = {"http://cnnmon.ie/1kcFZSQ"=>""}
=> {"http://cnnmon.ie/1kcFZSQ"=>""}
2.1.3 :005 > redirects.key?(slice_str)
=> true
2.1.3 :006 > redirects.key?(str)
=> true

For Hash keys, its key#hash method determines, whether keys considered equal or not.
In ruby 2.1.1 for your example those two string hashes are different:
slice_str.hash == str.hash # => false in Ruby 2.1.1
Though, it's totally unclear to me, why sliced string has different hash.
Even stranger -- I've found if you test the code on ASCII-only string (your string, but with ' instead of ’) -- the hashes will be the same!
It's really weird.
The only solution I've found (though it doesn't look elegant at all):
slice_str = context.slice(105,24).chars.join # split it into separate chars and then join back
p str.hash == slice_str.hash # true now
p redirects.key?(slice_str) # true now
UPD: Oops, I haven't saw link to bug in the comments above :(

Related

how to access properties returned by whois parser in ruby?

require 'rubygems'
require 'whois'
c = Whois::Client.new
r = c.lookup("seogroup.com")
puts r.admin_contacts
produces this:
#<struct Whois::Record::Contact id=nil, type=2, name="Marvin Russell", organization="SEO Group, LLC", address="222 W Ontario", city="Chicago", zip="60654", state="Illinois", country="United States", country_code=nil, phone="847-452-9902", fax=nil, email="marvin#seogroup.com", url=nil, created_on=nil, updated_on=nil>
How do I get at these properties like "state", "email" and "name" etc.
When I run your code I get back an array:
2.1.2 :013 > r.admin_contacts
=> [#<struct Whois::Record::Contact id=nil, type=2, name="Marvin Russell", organization="SEO Group, LLC", address="222 W Ontario", city="Chicago", zip="60654", state="Illinois", country="United States", country_code=nil, phone="847-452-9902", fax=nil, email="marvin#seogroup.com", url=nil, created_on=nil, updated_on=nil>]
And then introspecting a bit I see:
2.1.2 :014 > r.admin_contacts.class
=> Array
2.1.2 :015 > r.admin_contacts.length
=> 1
2.1.2 :016 > r.admin_contacts[0].class
=> Whois::Record::Contact
Which I then took a look at the available methods:
2.1.2 :017 > r.admin_contacts[0].methods
=> [:id, :id=, :type, :type=, :name...
And then pulled up the name and email:
2.1.2 :018 > r.admin_contacts[0][:name]
=> "Marvin Russell"
2.1.2 :019 > r.admin_contacts[0][:email]
=> "marvin#seogroup.com"

Encoding and decoding ruby symbols

I discovered this behavior of multi_json ruby gem:
2.1.0 :001 > require 'multi_json'
=> true
2.1.0 :002 > sym = :symbol
=> :symbol
2.1.0 :003 > sym.class
=> Symbol
2.1.0 :004 > res = MultiJson.load MultiJson.dump(sym)
=> "symbol"
2.1.0 :005 > res.class
=> String
Is this an appropriate way to store ruby symbols? Does JSON provide some way to distinguish :symbol from "string"?
Nope is the simple answer. Most of the time it only really matters for hashes and there's a cheat on hashes, symbolize_keys!. Bottom line is that JSON does not understand symbols, just strings.
Since you are using MultiJson, you can also ask MultiJson to do this for you...
MultiJson.load('{"abc":"def"}', :symbolize_keys => true)

Why was the object_id for true and nil changed in ruby2.0?

I came across this ruby object_id allocation question sometime back and then read this awesome article which talks about VALUE and explains why object_id of true, nil and false the way they are. I have been toying with ruby2.0 object_id when I found the apparent change that has been made regarding object_id of true and nil.
forbidden:~$ ruby -v
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
forbidden:~$
forbidden:~$ irb
irb(main):001:0> true.object_id
=> 20
irb(main):002:0> false.object_id
=> 0
irb(main):003:0> nil.object_id
=> 8
irb(main):004:0> exit
forbidden:~$
forbidden:~$ rvm use 1.9.3
Using /home/forbidden/.rvm/gems/ruby-1.9.3-p392
forbidden:~$ ruby -v
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux]
forbidden:~$
forbidden:~$ irb
irb(main):001:0> true.object_id
=> 2
irb(main):002:0> false.object_id
=> 0
irb(main):003:0> nil.object_id
=> 4
tl;dr: The values for true and nil were respectively 2, 4 in 1.9.3 and 1.8.7, but have been changed to 20, 8 in ruby2.0.0 - even though the id of false remains the same i.e. 0 and the ids for Fixnum maintains the same old 2n+1 pattern.
Also, the way Fixnum and Bignum are implemented is still the same in 2.0.0 as the example given in the above mentioned article also runs just the same way it used to:
irb(main):001:0>
irb(main):002:0* ((2**62)).class
=> Bignum
irb(main):003:0> ((2**62)-1).class
=> Fixnum
irb(main):004:0>
What's the reason behind this object_id change?
Why was this change made? How is this going to help developers?
A look at the Ruby source where these values are defined suggests that this has something to do with “flonums” (also see the commit where this was introduced). A search for ”flonum” came up with a message on the Ruby mailing list discussing it.
This is a technique for speeding up floating point calculations on 64 bit machines by using immediate values for some floating point vales, similar to using Fixnums for integers. The pattern for Flonums is ...xxxx xx10 (i.e. the last two bits are 10, where for fixnums the last bit is 1). The object_ids of other immediate values have been changed to accomodate this change.
You can see this change by looking at the object_ids of floats in Ruby 1.9.3 and 2.0.0.
In 1.9.3 different floats with the same value are different objects:
1.9.3p385 :001 > s = 10.234
=> 10.234
1.9.3p385 :002 > t = 10.234
=> 10.234
1.9.3p385 :003 > s.object_id
=> 2160496240
1.9.3p385 :004 > t.object_id
=> 2160508080
In 2.0.0 they are the same:
2.0.0p0 :001 > s = 10.234
=> 10.234
2.0.0p0 :002 > t = 10.234
=> 10.234
2.0.0p0 :003 > s.object_id
=> 82118635605473626
2.0.0p0 :004 > t.object_id
=> 82118635605473626

How can I escape a Ruby symbol with quotes only if needed?

IRB and Rails console both have a nice way of outputting symbols that only quote-escapes them when necessary. Some examples:
1.9.3p194 :001 > "#test".to_sym
=> :#test
1.9.3p194 :002 > "#Test".to_sym
=> :#Test
1.9.3p194 :003 > "#123".to_sym
=> :"#123"
1.9.3p194 :004 > "##_test".to_sym
=> :##_test
1.9.3p194 :005 > "test?".to_sym
=> :test?
1.9.3p194 :006 > "test!".to_sym
=> :test!
1.9.3p194 :007 > "_test!".to_sym
=> :_test!
1.9.3p194 :008 > "_test?".to_sym
=> :_test?
1.9.3p194 :009 > "A!".to_sym
=> :"A!"
1.9.3p194 :010 > "#a!".to_sym
=> :"#a!"
How would you do this yourself, so that you could do:
puts "This is valid code: #{escape_symbol(some_symbol)}"
The easiest and best way to do this is via Symbol's inspect method:
1.9.3p194 :013 > puts "This is valid code: #{"#a!".to_sym.inspect}"
This is valid code: :"#a!"
=> nil
1.9.3p194 :014 > puts "This is valid code: #{"a!".to_sym.inspect}"
This is valid code: :a!
You could look at the sym_inspect(VALUE sym) method in string.c in Ruby 1.9.3 that does that, if you're curious.
So, even though you don't need another method to call inspect, this would be the simplest implementation:
def escape_symbol(sym)
sym.inspect
end
Here's my attempt at implementing with a few regexs, although I'd suggest using inspect instead if you can:
def escape_symbol(sym)
sym =~ /^[#a-zA-Z_]#?[a-zA-Z_0-9]*$/ || sym =~ /^[a-z_][a-zA-Z_0-9]*[?!]?$/ ? ":#{sym}" : ":\"#{sym.gsub(/"/, '\\"')}\""
end

replicate CSV.generate_line behaviour of ruby 1.8.7 in ruby 1.9.2

ruby 1.9 now uses fastercsv, but how do i replicate the generate_line behaviour of ruby 1.8.7 ?
ruby-1.8.7-p334 :010 > require 'csv'
=> true
ruby-1.8.7-p334 :010 > CSV.generate_line(["ab","cd"], "\t")
=> "ab\tcd"
ruby-1.9.2-p180 :002 > require 'csv'
=> true
ruby-1.9.2-p180 :007 > CSV.generate_line(["ab","cd"], :row_sep => ?\t)
=> "ab,cd\t"
Notice how \t is between the two array items in ruby 1.8.7 and at last in 1.9.2
You have to use col_sep instead. row_sep is the row separator:
CSV.generate_line(["ab","cd"], :col_sep => ?\t)
=> "ab\tcd\n"
or
CSV.generate_line(["ab","cd"], :col_sep => ?\t, :row_sep => '')
=> "ab\tcd"
You can find more details and additional options in the documentation.
CSV.generate_line(['a','b','c'],:col_sep=>"\t")

Resources