why 2 same strings have the same object_id in Ruby? - ruby

As you may know that in Ruby two same strings do not have a same object_id, while two same symbols do. For instance:
irb(main):001:0> :george.object_id == :george.object_id
=> true
irb(main):002:0> "george".object_id == "george".object_id
=> false
However, in my code below, it shows that two strings which have a same value "one" having a same object_id.
class MyArray < Array
def ==(x)
comparison = Array.new()
x.each_with_index{|item, i| comparison.push(item.object_id.equal?(self[i].object_id))}
if comparison.include?(false) then
false
else
true
end
end
end
class MyHash < Hash
def ==(x)
y = Hash[self.sort]
puts y.class
puts y
x = Hash[x.sort]
puts x.class
puts x
puts "______"
xkeys = MyArray.new(x.keys)
puts xkeys.class
puts xkeys.to_s
puts xkeys.object_id
puts xkeys[0].class
puts xkeys[0]
puts xkeys[0].object_id
puts "______"
xvals = MyArray.new(x.values)
puts "______"
selfkeys = MyArray.new(y.keys)
puts selfkeys.class
puts selfkeys.to_s
puts selfkeys.object_id
puts selfkeys[0].class
puts selfkeys[0]
puts selfkeys[0].object_id
puts "______"
selfvals = MyArray.new(y.values)
puts xkeys.==(selfkeys)
puts xvals.==(selfvals)
end
end
a1 = MyHash[{"one" => 1, "two" => 2}]
b1 = MyHash[{"one" => 1, "two" => 2}]
puts a1.==(b1)
And Get
Hash
{"one"=>1, "two"=>2}
Hash
{"one"=>1, "two"=>2}
______
MyArray
["one", "two"]
21638020
String
one
21641920
______
______
MyArray
["one", "two"]
21637580
String
one
21641920
______
true
true
As you can see from the result that 2 String objects with have a same value "one" having a same object_id 21641920, while it's supposed to have different ID. So can anyone give me some hints or tell me how can I get different ID in this case?
Best Regards.

When a String object is used as a key in a Hash, the hash will duplicate and freeze the string internally and will use that copy as its key.
Reference: Hash#store.

As of ruby 2.2 strings used as keys in hash literals are frozen and de-duplicated: the same string will be reused.
This is a performance optimisation: not allocating many copies of the same string means there are fewer objects to allocate and fewer to garbage collect.
Another way to see frozen string literals in action :
"foo".freeze.object_id == "foo".freeze.object_id
Will return true in versions of ruby >= 2.1

Related

Why a block invoked by a Module can't modify objects from implementing classes in Ruby?

I have some data saved in deeply nested Hashes and Arrays and I have run into trouble with the text encoding of the data. I know for fact that the texts are encoded in "UTF-8", so I decided to go over each element and force the encoding.
So, I created a method called deep_each for the Enumerable module:
module Enumerable
def deep_each(&block)
self.each do |element|
if element.is_a? Enumerable then
element.deep_each(&block)
else
block[element]
end
end
end
end
And expected to be able to fix the data using the following method call:
deephash.deep_each {|element| element.force_encoding("UTF-8") if element.class == String}
But the result was disappointing:
deephash.deep_each {|element| element.force_encoding("UTF-8") if element.class == String}
> RuntimeError: can't modify frozen String
> from (pry):16:in `force_encoding'
Then I moved the function down the hierarchy, to the "Array" and "Hash" classes:
class Hash
def deep_each(&block)
self.each do |element|
if [Array, Hash].include? element.class then
element.deep_each(&block)
else
block[element]
end
end
end
end
class Array
def deep_each(&block)
self.each do |element|
if [Array, Hash].include? element.class then
element.deep_each(&block)
else
block[element]
end
end
end
end
Surprisingly, the same call works now.
What constraint am I violating here, and how can I define a method for all Enumerables without defining it for every single one of them?
As far as I can tell, you should get the exact same error with both your Enumerable version and your Array/Hash monkey patch. I do. Are you sure you're using the same deephash in both cases?
Normally when you loop each on a hash, you'd pass in both key and value to the block. You're passing a single value element to the block. This then is an Array with the key and value:
irb> {a:1, b:2}.each {|el| puts el.inspect }
[:a, 1]
[:b, 2]
Your deep_each checks if this is an Enumerable, and it is, so it calls deep_each on the list. Then, finally, you reach the leafs and call the block on the key and the value. The block checks if it's working with a String, and if so, forces encoding.
If your hash key is a string, you will try to mutate it. But hash keys are frozen, and so RuntimeError: can't modify frozen String is raised.
irb> {a: {b: {c: "abc"}}}.deep_each { |el| el << "efg" if String === el}
=> {:a=>{:b=>{:c=>{:d=>"abcefg"}}}}
irb> {a: {b: {"c" => "abc"}}}.deep_each { |el| el << "efg" if String === el}
RuntimeError: can't modify frozen String
str = "\xE2\x82\xAC" #Euro sign in UTF-8
puts str.encoding #=> UTF-8
puts str #=> Euro sign in a UTF-8 enabled terminal window
File.open('data.txt', 'w:utf-8') do |f|
f.write("#{str}\n")
end
Encoding.default_external = 'ISO-8859-1'
str = File.read('data.txt')
puts str.encoding #=> ISO-8859-1
arr = [
{a: str},
{b: 'world'},
]
arr[0][:a].force_encoding('utf-8')
puts arr[0][:a].encoding #=> UTF-8
puts arr[0][:a] #=> Euro sign in a UTF-8 enabled terminal window
It would be more illustrative if you posted an example of: I have run into trouble with the text encoding of the data
Finally, it looks like writing the method for each class separately
makes more sense. For the Hash I need to use each_value rather than
each
You can do something like this:
iterator_for = Hash.new(:each) #When a non-existent key is looked up, return :each
iterator_for.update({
Hash => :each_value,
})
data = [
%w{ hello world goodbye },
{"a" => "red", "b" => "blue"},
]
data.each do |element|
element.send(iterator_for[element.class]) do |x|
puts x
end
puts '-' * 20
end
--output:--
hello
world
goodbye
--------------------
red
blue
--------------------

Calling specific element from array not returning (Ruby)

I can't tell what's wrong with my code:
def morse_code(str)
string = []
string.push(str.split(' '))
puts string
puts string[2]
end
What I'm expecting is if I use "what is the dog" for str, I would get the following results:
=> ["what", "is", "the", "dog"]
=> "the"
But what I get instead is nil. If I do string[0], it just gives me the entire string again. Does the .split function not break them up into different elements? If anyone could help, that would be great. Thank you for taking the time to read this.
Your code should be :
def morse_code(str)
string = []
string.push(*str.split(' '))
puts string
p string[2]
end
morse_code("what is the dog" )
# >> what
# >> is
# >> the
# >> dog
# >> "the"
str.split(' ') is giving ["what", "is", "the", "dog"], and you are pushing this array object to the array string. Thus string became [["what", "is", "the", "dog"]]. Thus string is an array of size 1. Thus if you want to access any index like 1, 2 so on.., you will get nil. You can debug it using p(it calls #inspect on the array), BUT NOT puts.
def morse_code(str)
string = []
string.push(str.split(' '))
p string
end
morse_code("what is the dog" )
# >> [["what", "is", "the", "dog"]]
With Array, puts works completely different way than p. I am not good to read MRI code always, thus I take a look at sometime Rubinious code. Look how they defined IO::puts, which is same as MRI. Now look the specs for the code
it "flattens a nested array before writing it" do
#io.should_receive(:write).with("1")
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("3")
#io.should_receive(:write).with("\n").exactly(3).times
#io.puts([1, 2, [3]]).should == nil
end
it "writes nothing for an empty array" do
x = []
#io.should_receive(:write).exactly(0).times
#io.puts(x).should == nil
end
it "writes [...] for a recursive array arg" do
x = []
x << 2 << x
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("[...]")
#io.should_receive(:write).with("\n").exactly(2).times
#io.puts(x).should == nil
end
We can now be sure that, IO::puts or Kernel::puts behaves with array just the way, as Rubinious people implemented it. You can now take a look at the MRI code also. I just found the MRI one, look the below test
def test_puts_recursive_array
a = ["foo"]
a << a
pipe(proc do |w|
w.puts a
w.close
end, proc do |r|
assert_equal("foo\n[...]\n", r.read)
end)
end

Creating an md5 hash of a number, string, array, or hash in Ruby

I need to create a signature string for a variable in Ruby, where the variable can be a number, a string, a hash, or an array. The hash values and array elements can also be any of these types.
This string will be used to compare the values in a database (Mongo, in this case).
My first thought was to create an MD5 hash of a JSON encoded value, like so: (body is the variable referred to above)
def createsig(body)
Digest::MD5.hexdigest(JSON.generate(body))
end
This nearly works, but JSON.generate does not encode the keys of a hash in the same order each time, so createsig({:a=>'a',:b=>'b'}) does not always equal createsig({:b=>'b',:a=>'a'}).
What is the best way to create a signature string to fit this need?
Note: For the detail oriented among us, I know that you can't JSON.generate() a number or a string. In these cases, I would just call MD5.hexdigest() directly.
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body)
Digest::MD5.hexdigest( sigflat body )
end
def sigflat(body)
if body.class == Hash
arr = []
body.each do |key, value|
arr << "#{sigflat key}=>#{sigflat value}"
end
body = arr
end
if body.class == Array
str = ''
body.map! do |value|
sigflat value
end.sort!.each do |value|
str << value
end
end
if body.class != String
body = body.to_s << body.class.to_s
end
body
end
> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})
=> true
If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'
class Object
def md5key
to_s
end
end
class Array
def md5key
map(&:md5key).join
end
end
class Hash
def md5key
sort.map(&:md5key).join
end
end
Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:
def createsig(o)
Digest::MD5.hexdigest(o.md5key)
end
Example:
body = [
{
'bar' => [
345,
"baz",
],
'qux' => 7,
},
"foo",
123,
]
p body.md5key # => "bar345bazqux7foo123"
p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum
refine Object do
def objsum
parts = []
queue = [self]
while queue.size > 0
item = queue.shift
if item.kind_of?(Hash)
parts << "\\000"
item.keys.sort.each do |k|
queue << k
queue << item[k]
end
elsif item.kind_of?(Set)
parts << "\\001"
item.to_a.sort.each { |i| queue << i }
elsif item.kind_of?(Enumerable)
parts << "\\002"
item.each { |i| queue << i }
elsif item.kind_of?(Fixnum)
parts << "\\003"
parts << item.to_s
elsif item.kind_of?(Float)
parts << "\\004"
parts << item.to_s
else
parts << item.to_s
end
end
Digest::MD5.hexdigest(parts.join)
end
end
end
Just my 2 cents:
module Ext
module Hash
module InstanceMethods
# Return a string suitable for generating content signature.
# Signature image does not depend on order of keys.
#
# {:a => 1, :b => 2}.signature_image == {:b => 2, :a => 1}.signature_image # => true
# {{:a => 1, :b => 2} => 3}.signature_image == {{:b => 2, :a => 1} => 3}.signature_image # => true
# etc.
#
# NOTE: Signature images of identical content generated under different versions of Ruby are NOT GUARANTEED to be identical.
def signature_image
# Store normalized key-value pairs here.
ar = []
each do |k, v|
ar << [
k.is_a?(::Hash) ? k.signature_image : [k.class.to_s, k.inspect].join(":"),
v.is_a?(::Hash) ? v.signature_image : [v.class.to_s, v.inspect].join(":"),
]
end
ar.sort.inspect
end
end
end
end
class Hash #:nodoc:
include Ext::Hash::InstanceMethods
end
These days there is a formally defined method for canonicalizing JSON, for exactly this reason: https://datatracker.ietf.org/doc/html/draft-rundgren-json-canonicalization-scheme-16
There is a ruby implementation here: https://github.com/dryruby/json-canonicalization
Depending on your needs, you could call ary.inspect or ary.to_yaml, even.

Binary or "|" in ruby

Why isnt that working:
>> s = "hi"
=> "hi"
>> s == ("hi"|"ho")
NoMethodError: undefined method `|' for "hi":String
from (irb):2
>>
I don't get it.. Is there a solution for this kind of syntax? Because
s == ("hi"|"ho")
#is shorther than
s == "hi" || s == "ho"
Yes, the bitwise operator | is not defined in the String class: http://ruby-doc.org/core/classes/String.html
Consider this for expressiveness:
["hi", "ho"].include? myStr
irb(main):001:0> s = "hi"
=> "hi"
irb(main):002:0> ["hi", "ho"]
=> ["hi", "ho"]
irb(main):003:0> ["hi", "ho"].include? s
=> true
irb(main):004:0> s = "foo"
=> "foo"
irb(main):005:0> ["hi", "ho"].include? s
=> false
In most high level languages that syntax will not work, you have to stick to the longer syntax of:
s == "hi" || s == "ho"
Note that | is a bitwise or, whereas || is a regular or
You could use the include? method on array if you've got several == tests to do:
["hi", "ho"].include?(s)
Not shorter for two checks admittedly but it will be shorter for three or more.
This syntax doesn't exist in any language as far as I know.
What you are saying
s == ("hi"|"ho")
Literally translates to 'bitwise OR the strings "hi" and "ho" together and then compare them with s'. If you can't see why this is not what you are looking for, try writing down the ASCII codes for "hi" and "ho" and then bitwise ORing them together. You are going to get complete gibberish.
You could make it work that way:
irb> class Pair
def initialize(strA,strB)
#strA,#strB = strA,strB
end
def ==(string)
string == #strA || string == #strB
end
def |(other)
Pair.new(self,other)
end
end
#=> nil
irb> class String
def |(other)
Pair.new(self,other)
end
alias old_equals :==
def ==(other)
if other.kind_of? Pair
other == self
else
old_equals other
end
end
end
#=> nil
irb> ("one"|"two") == "one"
#=> true
irb> ("one"|"two") == "two"
#=> true
irb> ("one"|"two") == "three"
#=> false
irb> "one" == ("one"|"two")
#=> true
irb> "three" == ("one"|"two"|"three")
#=> true
But since this involves some monkey-patching of a fairly lowlevel class, I wouldn't advise relying on it. Other people will hate reading your code.
Ruby supports binary 'or' and other binary operations on values of type Fixnum and Bignum, meaning any integer. Bitwise operations aren't supported on strings or any other type, as far as I know.
As other people have mentioned, you probably want something other than binary operations altogether. However, you can easily get integer representations of characters, so you can compare characters like so:
a = "Cake"
b = "Pie"
puts a[0] | b[0] # Prints "83" - C is 67 and P is 80.
You can get an array of the comparisons easily with some conversions.
a = "Cake"
b = "Pie " # Strings of uneven length is trivial but more cluttered.
a_arr = a.split(//)
b_arr = b.split(//)
c_arr = []
a.each_with_index { |char, i| c.push(a[i].to_i | b[i].to_i) }
# If you *really* want an ASCII string back...
c = c_arr.collect(&:chr).join
You could use a regex:
Like so:
regex = /hi|ho/
s = "hi"
t = "foo"
s =~ regex
#=> 0
t =~ regex
#=> nil

hash methods argument values

Working on trying to understand the syntax for calling on different values of a hash.
For example lets say I am trying to delete 'pants' How do go about setting the argument for something like this:
products = {124 => ['shoes', 59.99], 352 => ['shirt', 19.99], 777 => ['pants', 19.87],
667 => ['jacket', 39.99], 898 => ['shoulder_holster', 22.78]}
While writing a menu driven program for this hash I'm including error checking before deleteing or adding a key this is what I have so far:
if a == 3 # Loop delete a Product
puts "Delete a Product"
d = gets.to_s # Get value for argument
while products.has_value?( d + syntax for right here???? )!= true do
puts "This turned out false because product does not exsist!"
d = gets.to_s
end
puts "Congrats your out of the loop"
products.delete(d + again syntax problems ???? )
puts products
end
How do I enter the syntax for the argument if I where to delete pants. Would it be ([d,:number]) I'm not having luck with any resources online with how to delete or add in this scenario. Any help or code example would be appreciated,
Matt
products.to_a.select {|a| a.last.first == 'pants' }
That will get you the record that matches 'pants'.
[[777, ["pants", 19.87]]]
So I think you'll want
while !products.to_a.select {|a| a.last.first == d }.empty?
on your loop then use Dafydd's line to delete the record.
It depends on whether the user is inputing the ID number or the name "pants". If the former:
if a == 3 # Loop delete a Product
puts "Delete a Product"
d = gets # Get value for argument
until products.has_key?(d.to_i)
puts "This turned out false because product does not exsist!"
d = gets
end
puts "Congrats your out of the loop"
products.delete(d.to_i)
puts products
end
If it's "pants", then this is how you want to do it:
if a == 3 # Loop delete a Product
puts "Delete a Product"
d = gets.strip # Need to strip because otherwise the newline will wreck it
until products.find {|key, val| val.first == d}
puts "This turned out false because product does not exsist!"
d = gets.strip
end
puts "Congrats your out of the loop"
products.delete_if {|key, val| val.first == d}
puts products
end
Writing a "delete named product from hash" method
There are shorter ways of doing it, but shooting for clarity I came up with this:
products = {124 => ['shoes', 59.99], 352 => ['shirt', 19.99], 777 => ['pants', 19.87],
667 => ['jacket', 39.99], 898 => ['shoulder_holster', 22.78]}
def wipeProduct(hash, nameToDelete)
hash.each do |i|
key = i[0]
productName = i[1].first
hash.delete(key) if productName==nameToDelete
end
end
puts products.inspect
wipeProduct(products,'pants')
puts products.inspect
wipeProduct(products,'shoulder_holster')
puts products.inspect
bash-3.2$ ruby prod.rb
{352=>["shirt", 19.99], 898=>["shoulder_holster", 22.78], 667=>["jacket", 39.99], 777=>["pants", 19.87], 124=>["shoes", 59.99]}
{352=>["shirt", 19.99], 898=>["shoulder_holster", 22.78], 667=>["jacket", 39.99], 124=>["shoes", 59.99]}
{352=>["shirt", 19.99], 667=>["jacket", 39.99], 124=>["shoes", 59.99]}
I don't know if it's possible for "pants" to occur in the hash in multiple places, but since I used "hash.each(...)", the method wipeProduct(hash, nameToDelete) will test every hash entry.
The input type bug and how to fix it
When you take input, you're assigning the string you captured to d. Here's the proof:
irb(main):010:0> d = gets.to_s
12
=> "12\n"
irb(main):011:0> d.class
=> String
You can convert that string to a Fixnum like this:
irb(main):012:0> d.to_i
=> 12
irb(main):013:0> d.to_i.class
=> Fixnum
All keys in the products hash are Fixnums. Here's the proof:
irb(main):014:0> products.keys.each {|i| puts i.class}
Fixnum
Fixnum
Fixnum
Fixnum
Fixnum
=> [352, 898, 667, 777, 124]
So you need to capture the value for the argument with this line:
d = gets.to_i # Get value for argument
The deletion part of the answer:
From products, you can delete the pants entry programmatically with this:
products.delete(777)
Running it gets you this:
irb(main):003:0> products.delete(777)
=> ["pants", 19.87]
Notice that you supply the key value (in this case 777) to .delete() and that it returns an array consisting of the key and value in that order respectively.
An alternative implementation
I'm not sure if it's safe to modify a hash in a block that's iterating over the key-value pairs in the hash. If it isn't, you can just save up all the keys to be deleted and delete them after iterating over the hash:
def wipeProduct(hash, nameToDelete)
keysToDelete = []
hash.each do |i|
key = i[0]
productName = i[1].first
keysToDelete << key if productName==nameToDelete
end
keysToDelete.each {|key| hash.delete(key) }
end
Here's the neater way to delete the "pants" entry:
def wipeProduct(hash, nameToDelete)
hash.reject!{|key,value| nameToDelete==value.first}
end
The reject! block gets to see each key-value pair, and when it returns true, the key-value supplied will be removed from the hash.
if a == 3 # Loop delete a Product
puts "Delete a Product by its key number"
d = gets
while products.has_key?(d)!= false do
puts "You have selected a key that is not currently in use"
d = gets
end
puts "You have deleted"
products.delete(d)
puts products
end
This is what I ended up doing had some trouble with the until loop so swapped for a while loop though becasue it wouldn't accept newly entered keys for some reason

Resources