Related
I'm interested in implementing a custom equality method for use in an array of objects in Ruby. Here's a stripped-back example:
class Foo
attr_accessor :a, :b
def initialize(a, b)
#a = a
#b = b
end
def ==(other)
puts 'doing comparison'
#a == #a && #b == #b
end
def to_s
"#{#a}: #{#b}"
end
end
a = [
Foo.new(1, 1),
Foo.new(1, 2),
Foo.new(2, 1),
Foo.new(2, 2),
Foo.new(2, 2)
]
a.uniq
I expected the uniq method to call Foo#==, and remove the last instance of Foo. Instead, I don't see the 'doing comparison' debug line and the array remains the same length.
Notes:
I'm using ruby 2.2.2
I've tried defining the method as ===
I have done it long-hand with a.uniq{|x| [x.a, x.b]}, but I don't like this solution it's making the code look pretty cluttered.
It compares values using their hash and eql? methods for efficiency.
https://ruby-doc.org/core-2.5.0/Array.html#method-i-uniq-3F
So you should override eql? (that is ==) and hash
UPDATE:
I cannot explain fully why is that, but overriding hash and == doesn't work. I guess it's cause by the way uniq is implemented in C:
From: array.c (C Method):
Owner: Array
Visibility: public
Number of lines: 20
static VALUE
rb_ary_uniq(VALUE ary)
{
VALUE hash, uniq;
if (RARRAY_LEN(ary) <= 1)
return rb_ary_dup(ary);
if (rb_block_given_p()) {
hash = ary_make_hash_by(ary);
uniq = rb_hash_values(hash);
}
else {
hash = ary_make_hash(ary);
uniq = rb_hash_values(hash);
}
RBASIC_SET_CLASS(uniq, rb_obj_class(ary));
ary_recycle_hash(hash);
return uniq;
}
You can bypass that by using a block version of uniq:
> [Foo.new(1,2), Foo.new(1,2), Foo.new(2,3)].uniq{|f| [f.a, f.b]}
=> [#<Foo:0x0000562e48937cc8 #a=1, #b=2>, #<Foo:0x0000562e48937c78 #a=2, #b=3>]
Or use Struct instead:
F = Struct.new(:a, :b)
[F.new(1,2), F.new(1,2), F.new(2,3)].uniq
# => [#<struct F a=1, b=2>, #<struct F a=2, b=3>]
UPDATE2:
Actually in terms of overriding it's not the same if you override == or eql?. When I overriden eql? It worked as intended:
class Foo
attr_accessor :a, :b
def initialize(a, b)
#a = a
#b = b
end
def eql?(other)
(#a == other.a && #b == other.b)
end
def hash
[a, b].hash
end
def to_s
"#{#a}: #{#b}"
end
end
a = [
Foo.new(1, 1),
Foo.new(1, 2),
Foo.new(2, 1),
Foo.new(2, 2),
Foo.new(2, 2)
]
a.uniq
#=> [#<Foo:0x0000562e483bff70 #a=1, #b=1>,
#<Foo:0x0000562e483bff48 #a=1, #b=2>,
#<Foo:0x0000562e483bff20 #a=2, #b=1>,
#<Foo:0x0000562e483bfef8 #a=2, #b=2>]
You can find the answer in the documentation of Array#uniq (for some reason, it is not mentioned in the documentation of Enumerable#uniq):
It compares values using their hash and eql? methods for efficiency.
The contracts of hash and eql? are as follows:
hash returns an Integer that must be the same for objects which are considered equal, but does not necessarily have to be different for objects that are not equal. This means that different hashes mean that the objects are definitely not equal, but the same hash doesn't tell you anything. Ideally, hash should also be resistant to accidental and deliberate collisions.
eql? is value equality, usually stricter than == but less strict than equal? which is more or less identity: equal? should only return true if you compare an object to itself.
uniq? uses the same trick that is used in hash tables, hash sets, etc. to speed up lookups:
Compare the hashes. Computing a hash should normally be fast.
If the hashes are identical, then, and only then double-check using eql?.
So I need to create an instance method for Array that takes two arguments, the size of an array and an optional object that will be appended to an array.
If the the size argument is less than or equal to the Array.length or the size argument is equal to 0, then just return the array. If the optional argument is left blank, then it inputs nil.
Example output:
array = [1,2,3]
array.class_meth(0) => [1,2,3]
array.class_meth(2) => [1,2,3]
array.class_meth(5) => [1,2,3,nil,nil]
array.class_meth(5, "string") => [1,2,3,"string","string"]
Here is my code that I've been working on:
class Array
def class_meth(a ,b=nil)
self_copy = self
diff = a - self_copy.length
if diff <= 0
self_copy
elsif diff > 0
a.times {self_copy.push b}
end
self_copy
end
def class_meth!(a ,b=nil)
# self_copy = self
diff = a - self.length
if diff <= 0
self
elsif diff > 0
a.times {self.push b}
end
self
end
end
I've been able to create the destructive method, class_meth!, but can't seem to figure out a way to make it non-destructive.
Here's (IMHO) a cleaner solution:
class Array
def class_meth(a, b = nil)
clone.fill(b, size, a - size)
end
def class_meth!(a, b = nil)
fill(b, size, a - size)
end
end
I think it should meet all your needs. To avoid code duplication, you can make either method call the other one (but not both simulaneously, of course):
def class_meth(a, b = nil)
clone.class_meth!(a, b)
end
or:
def class_meth!(a, b = nil)
replace(class_meth(a, b))
end
As you problem has been diagnosed, I will just offer a suggestion for how you might do it. I assume you want to pass two and optionally three, not one and optionally two, parameters to the method.
Code
class Array
def self.class_meth(n, arr, str=nil)
arr + (str ? ([str] : [nil]) * [n-arr.size,0].max)
end
end
Examples
Array.class_meth(0, [1,2,3])
#=> [1,2,3]
Array.class_meth(2, [1,2,3])
#=> [1,2,3]
Array.class_meth(5, [1,2,3])
#=> [1,2,3,nil,nil]
Array.class_meth(5, [1,2,3], "string")
#=> [1,2,3,"string","string"]
Array.class_meth(5, ["dog","cat","pig"])
#=> [1,2,3,"string","string"]
Array.class_meth(5, ["dog","cat","pig"], "string")
#=> [1,2,3,"string","string"]
Array.class_meth(5, ["dog","cat","pig"])
#=> ["dog", "cat", "pig", nil, nil]
Array.class_meth(5, ["dog","cat","pig"], "string")
#=> ["dog", "cat", "pig", "string", "string"]
Before withdrawing his answer, #PatriceGahide suggested using Array#fill. That would be an improvement here; i.e., replace the operative line with:
arr.fill(str ? str : nil, arr.size, [n-arr.size,0].max)
self_copy = self does not make a new object - assignment in Ruby never "copies" or creates a new object implicitly.
Thus the non-destructive case works on the same object (the instance the method was invoked upon) as in the destructive case, with a different variable bound to the same object - that is self.equal? self_copy is true.
The simplest solution is to merely use #clone, keeping in mind it is a shallow clone operation:
def class_meth(a ,b=nil)
self_copy = self.clone # NOW we have a new object ..
# .. so we can modify the duplicate object (self_copy)
# down here without affecting the original (self) object.
end
If #clone cannot be used other solutions involve create a new array or obtain an array #slice (returns a new array) or even append (returning a new array) with #+; however, unlike #clone, these generally lock-into returning an Array and not any sub-type as may be derived.
After the above change is made it should also be apparent that it can written as so:
def class_meth(a ,b=nil)
clone.class_meth!(a, b) # create a NEW object; modify it; return it
# (assumes class_meth! returns the object)
end
A more appropriate implementation of #class_meth!, or #class_meth using one of the other forms to avoid modification of the current instance, is left as an exercise.
FWIW: Those are instance methods, which is appropriate, and not "class meth[ods]"; don't be confused by the ill-naming.
I am looking for a way to have, I would say synonym keys in the hash.
I want multiple keys to point to the same value, so I can read/write a value through any of these keys.
As example, it should work like that (let say :foo and :bar are synonyms)
hash[:foo] = "foo"
hash[:bar] = "bar"
puts hash[:foo] # => "bar"
Update 1
Let me add couple of details. The main reason why I need these synonyms, because I receive keys from external source, which I can't control, but multiple keys could actually be associated with the same value.
Rethink Your Data Structure
Depending on how you want to access your data, you can make either the keys or the values synonyms by making them an array. Either way, you'll need to do more work to parse the synonyms than the definitional word they share.
Keys as Definitions
For example, you could use the keys as the definition for your synonyms.
# Create your synonyms.
hash = {}
hash['foo'] = %w[foo bar]
hash
# => {"foo"=>["foo", "bar"]}
# Update the "definition" of your synonyms.
hash['baz'] = hash.delete('foo')
hash
# => {"baz"=>["foo", "bar"]}
Values as Definitions
You could also invert this structure and make your keys arrays of synonyms instead. For example:
hash = {["foo", "bar"]=>"foo"}
hash[hash.rassoc('foo').first] = 'baz'
=> {["foo", "bar"]=>"baz"}
You could subclass hash and override [] and []=.
class AliasedHash < Hash
def initialize(*args)
super
#aliases = {}
end
def alias(from,to)
#aliases[from] = to
self
end
def [](key)
super(alias_of(key))
end
def []=(key,value)
super(alias_of(key), value)
end
private
def alias_of(key)
#aliases.fetch(key,key)
end
end
ah = AliasedHash.new.alias(:bar,:foo)
ah[:foo] = 123
ah[:bar] # => 123
ah[:bar] = 456
ah[:foo] # => 456
What you can do is completely possible as long as you assign the same object to both keys.
variable_a = 'a'
hash = {foo: variable_a, bar: variable_a}
puts hash[:foo] #=> 'a'
hash[:bar].succ!
puts hash[:foo] #=> 'b'
This works because hash[:foo] and hash[:bar] both refer to the same instance of the letter a via variable_a. This however wouldn't work if you used the assignment hash = {foo: 'a', bar: 'a'} because in that case :foo and :bar refer to different instance variables.
The answer to your original post is:
hash[:foo] = hash[:bar]
and
hash[:foo].__id__ == hash[:bar].__id__it
will hold true as long as the value is a reference value (String, Array ...) .
The answer to your Update 1 could be:
input.reduce({ :k => {}, :v => {} }) { |t, (k, v)|
t[:k][t[:v][v] || k] = v;
t[:v][v] = k;
t
}[:k]
where «input» is an abstract enumerator (or array) of your input data as it comes [key, value]+, «:k» your result, and «:v» an inverted hash that serves the purpose of finding a key if its value is already present.
I need to populate a Hash with various values. Some of values are accessed often enough and another ones really seldom.
The issue is, I'm using some computation to get values and populating the Hash becomes really slow with multiple keys.
Using some sort of cache is not a option in my case.
I wonder how to make the Hash compute the value only when the key is firstly accessed and not when it is added?
This way, seldom used values wont slow down the filling process.
I'm looking for something that is "kinda async" or lazy access.
There are many different ways to approach this. I recommend using an instance of a class that you define instead of a Hash. For example, instead of...
# Example of slow code using regular Hash.
h = Hash.new
h[:foo] = some_long_computation
h[:bar] = another_long_computation
# Access value.
puts h[:foo]
... make your own class and define methods, like this...
class Config
def foo
some_long_computation
end
def bar
another_long_computation
end
end
config = Config.new
puts config.foo
If you want a simple way to cache the long computations or it absolutely must be a Hash, not your own class, you can now wrap the Config instance with a Hash.
config = Config.new
h = Hash.new {|h,k| h[k] = config.send(k) }
# Access foo.
puts h[:foo]
puts h[:foo] # Not computed again. Cached from previous access.
One issue with the above example is that h.keys will not include :bar because you haven't accessed it yet. So you couldn't, for example, iterate over all the keys or entries in h because they don't exist until they're actually accessed. Another potential issue is that your keys need to be valid Ruby identifiers, so arbitrary String keys with spaces won't work when defining them on Config.
If this matters to you, there are different ways to handle it. One way you can do it is to populate your hash with thunks and force the thunks when accessed.
class HashWithThunkValues < Hash
def [](key)
val = super
if val.respond_to?(:call)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
end
h = HashWithThunkValues.new
# Populate hash.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation } # Some key that's an invalid ruby identifier.
# Access hash.
puts h[:foo]
puts h[:foo] # Not computed again. Cached from previous access.
puts h.keys #=> [:foo, :bar, "invalid Ruby name"]
One caveat with this last example is that it won't work if your values are callable because it can't tell the difference between a thunk that needs to be forced and a value.
Again, there are ways to handle this. One way to do it would be to store a flag that marks whether a value has been evaluated. But this would require extra memory for every entry. A better way would be to define a new class to mark that a Hash value is an unevaluated thunk.
class Unevaluated < Proc
end
class HashWithThunkValues < Hash
def [](key)
val = super
# Only call if it's unevaluated.
if val.is_a?(Unevaluated)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
end
# Now you must populate like so.
h = HashWithThunkValues.new
h[:foo] = Unevaluated.new { some_long_computation }
h[:bar] = Unevaluated.new { another_long_computation }
h["invalid Ruby name"] = Unevaluated.new { a_third_computation } # Some key that's an invalid ruby identifier.
h[:some_proc] = Unevaluated.new { Proc.new {|x| x + 2 } }
The downside of this is that now you have to remember to use Unevaluted.new when populating your Hash. If you want all values to be lazy, you could override []= also. I don't think it would actually save much typing because you'd still need to use Proc.new, proc, lambda, or ->{} to create the block in the first place. But it might be worthwhile. If you did, it might look something like this.
class HashWithThunkValues < Hash
def []=(key, val)
super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
end
end
So here is the full code.
class HashWithThunkValues < Hash
# This can be scoped inside now since it's not used publicly.
class Unevaluated < Proc
end
def [](key)
val = super
# Only call if it's unevaluated.
if val.is_a?(Unevaluated)
# Force the thunk to get actual value.
val = val.call
# Cache the actual value so we never run long computation again.
self[key] = val
end
val
end
def []=(key, val)
super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
end
end
h = HashWithThunkValues.new
# Populate.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation } # Some key that's an invalid ruby identifier.
h[:some_proc] = ->{ Proc.new {|x| x + 2 } }
You can define your own indexer with something like this:
class MyHash
def initialize
#cache = {}
end
def [](key)
#cache[key] || (#cache[key] = compute(key))
end
def []=(key, value)
#cache[key] = value
end
def compute(key)
#cache[key] = 1
end
end
and use it as follows:
1.9.3p286 :014 > hash = MyHash.new
=> #<MyHash:0x007fa0dd03a158 #cache={}>
1.9.3p286 :019 > hash["test"]
=> 1
1.9.3p286 :020 > hash
=> #<MyHash:0x007fa0dd03a158 #cache={"test"=>1}>
you can use this:
class LazyHash < Hash
def [] key
(_ = (#self||{})[key]) ?
((self[key] = _.is_a?(Proc) ? _.call : _); #self.delete(key)) :
super
end
def lazy_update key, &proc
(#self ||= {})[key] = proc
self[key] = proc
end
end
Your lazy hash will behave exactly as a normal Hash, cause it is actually a real Hash.
See live demo here
*** UPDATE - answering to nested procs question ***
Yes, it would work, but it is cumbersome.
See updated answer.
Use lazy_update instead of []= to add "lazy" values to your hash.
This isn't strictly an answer to the body of your question, but Enumerable::Lazy will definitely be a part of Ruby 2.0. This will let you do lazy evaluation on iterator compositions:
lazy = [1, 2, 3].lazy.select(&:odd?)
# => #<Enumerable::Lazy: #<Enumerator::Generator:0x007fdf0b864c40>:each>
lazy.to_a
# => [40, 50]
Here's some example code:
class Obj
attr :c, true
def == that
p '=='
that.c == self.c
end
def <=> that
p '<=>'
that.c <=> self.c
end
def equal? that
p 'equal?'
that.c.equal? self.c
end
def eql? that
p 'eql?'
that.c.eql? self.c
end
end
a = Obj.new
b = Obj.new
a.c = 1
b.c = 1
p [a] | [b]
It prints 2 objects but it should print 1 object. None of the comparison methods get called. How is Array.| comparing for equality?
Array#| is implemented using hashs. So in order for your type to work well with it (as well as with hashmaps and hashsets), you'll have to implement eql? (which you did) and hash (which you did not). The most straight forward way to define hash meaningfully would be to just return c.hash.
Ruby's Array class is implemented in C, and from what I can tell, uses a custom hash table to check for equality when comparing objects in |. If you wanted to modify this behavior, you'd have to write your own version that uses an equality check of your choice.
To see the full implementation of Ruby's Array#|: click here and search for "rb_ary_or(VALUE ary1, VALUE ary2)"
Ruby is calling the hash functions and they are returning different values, because they are still just returning the default object_id. You will need to def hash and return something reflecting your idea of what makes an Obj significant.
>> class Obj2 < Obj
>> def hash; t = super; p ['hash: ', t]; t; end
>> end
=> nil
>> x, y, x.c, y.c = Obj2.new, Obj2.new, 1, 1
=> [#<Obj2:0x100302568 #c=1>, #<Obj2:0x100302540 #c=1>, 1, 1]
>> p [x] | [y]
["hash: ", 2149061300]
["hash: ", 2149061280]
["hash: ", 2149061300]
["hash: ", 2149061280]
[#<Obj2:0x100302568 #c=1>, #<Obj2:0x100302540 #c=1>]