Hash Enumerable methods: Inconsistent behavior when passing only one parameter - ruby

Ruby's enumerable methods for Hash expect 2 parameters, one for the key and one for the value:
hash.each { |key, value| ... }
However, I notice that the behavior is inconsistent among the enumerable methods when you only pass one parameter:
student_ages = {
"Jack" => 10,
"Jill" => 12,
}
student_ages.each { |single_param| puts "param: #{single_param}" }
student_ages.map { |single_param| puts "param: #{single_param}" }
student_ages.select { |single_param| puts "param: #{single_param}" }
student_ages.reject { |single_param| puts "param: #{single_param}" }
# results:
each...
param: ["Jack", 10]
param: ["Jill", 12]
map...
param: ["Jack", 10]
param: ["Jill", 12]
select...
param: Jack
param: Jill
reject...
param: Jack
param: Jill
As you can see, for each and map, the single parameter gets assigned to a [key, value] array, but for select and reject, the parameter is only the key.
Is there a particular reason for this behavior? The docs don't seem to mention this at all; all of the examples given just assume that you are passing in two parameters.

Just checked Rubinius behavior and it is indeed consistent with CRuby. So looking at the Ruby implementation - it is indeed because #select yields two values:
yield(item.key, item.value)
while #each yields an array with two values:
yield [item.key, item.value]
Yielding two values to a block that expects one takes the first argument and ignores the second one:
def foo
yield :bar, :baz
end
foo { |x| p x } # => :bar
Yielding an array will either get completely assigned if the block has one parameter or get unpacked and assigned to each individual value (as if you passed them one by one) if there are two or more parameters.
def foo
yield [:bar, :baz]
end
foo { |x| p x } # => [:bar, :baz]
As for why they made that descision - there probably isn't any good reason behind it, it just wasn't expected people to call them with one argument.

My guess is that internally map is just each with collect. Interesting they don't work quite the same way.
As to each...
The source code is below. It checks how many arguments you've passed into the block. If more than one it calls each_pair_i_fast, otherwise just each_pair_i.
static VALUE
rb_hash_each_pair(VALUE hash)
{
RETURN_SIZED_ENUMERATOR(hash, 0, 0, hash_enum_size);
if (rb_block_arity() > 1)
rb_hash_foreach(hash, each_pair_i_fast, 0);
else
rb_hash_foreach(hash, each_pair_i, 0);
return hash;
}
each_pair_i_fast returns two distinct values:
each_pair_i_fast(VALUE key, VALUE value)
{
rb_yield_values(2, key, value);
return ST_CONTINUE;
}
each_pair_i does not:
each_pair_i(VALUE key, VALUE value)
{
rb_yield(rb_assoc_new(key, value));
return ST_CONTINUE;
}
rb_assoc_new returns a two element array (at least I'm assuming that is what rb_ary_new3 does
rb_assoc_new(VALUE car, VALUE cdr)
{
return rb_ary_new3(2, car, cdr);
}
select looks like this:
rb_hash_select(VALUE hash)
{
VALUE result;
RETURN_SIZED_ENUMERATOR(hash, 0, 0, hash_enum_size);
result = rb_hash_new();
if (!RHASH_EMPTY_P(hash)) {
rb_hash_foreach(hash, select_i, result);
}
return result;
}
and select_i looks like this:
select_i(VALUE key, VALUE value, VALUE result)
{
if (RTEST(rb_yield_values(2, key, value))) {
rb_hash_aset(result, key, value);
}
return ST_CONTINUE;
}
And I'm going to assume that rb_hash_aset returns two distinct arguments similar to each_pair_i.
Most important notice that select/etc doesn't check the argument arity at all.
Sources:
https://github.com/ruby/ruby/blob/d5c5d5c778a0e8d61ab07669132dc18fb1a2e874/hash.c
https://github.com/ruby/ruby/blob/9f44b77a18d4d6099174c6044261eb1611a147ea/array.c

Related

How do you check for matching keys in a ruby hash?

I'm learning coding, and one of the assignments is to return keys is return the names of people who like the same TV show.
I have managed to get it working and to pass TDD, but I'm wondering if I've taken the 'long way around' and that maybe there is a simpler solution?
Here is the setup and test:
class TestFriends < MiniTest::Test
def setup
#person1 = {
name: "Rick",
age: 12,
monies: 1,
friends: ["Jay","Keith","Dave", "Val"],
favourites: {
tv_show: "Friends",
things_to_eat: ["charcuterie"]
}
}
#person2 = {
name: "Jay",
age: 15,
monies: 2,
friends: ["Keith"],
favourites: {
tv_show: "Friends",
things_to_eat: ["soup","bread"]
}
}
#person3 = {
name: "Val",
age: 18,
monies: 20,
friends: ["Rick", "Jay"],
favourites: {
tv_show: "Pokemon",
things_to_eat: ["ratatouille", "stew"]
}
}
#people = [#person1, #person2, #person3]
end
def test_shared_tv_shows
expected = ["Rick", "Jay"]
actual = tv_show(#people)
assert_equal(expected, actual)
end
end
And here is the solution that I found:
def tv_show(people_list)
tv_friends = {}
for person in people_list
if tv_friends.key?(person[:favourites][:tv_show]) == false
tv_friends[person[:favourites][:tv_show]] = [person[:name]]
else
tv_friends[person[:favourites][:tv_show]] << person[:name]
end
end
for array in tv_friends.values()
if array.length() > 1
return array
end
end
end
It passes, but is there a better way of doing this?
I think you could replace those for loops with the Array#each. But in your case, as you're creating a hash with the values in people_list, then you could use the Enumerable#each_with_object assigning a new Hash as its object argument, this way you have your own person hash from the people_list and also a new "empty" hash to start filling as you need.
To check if your inner hash has a key with the value person[:favourites][:tv_show] you can check for its value just as a boolean one, the comparison with false can be skipped, the value will be evaluated as false or true by your if statement.
You can create the variables tv_show and name to reduce a little bit the code, and then over your tv_friends hash to select among its values the one that has a length greater than 1. As this will give you an array inside an array you can get from this the first element with first (or [0]).
def tv_show(people_list)
tv_friends = people_list.each_with_object(Hash.new({})) do |person, hash|
tv_show = person[:favourites][:tv_show]
name = person[:name]
hash.key?(tv_show) ? hash[tv_show] << name : hash[tv_show] = [name]
end
tv_friends.values.select { |value| value.length > 1 }.first
end
Also you can omit parentheses when the method call doesn't have arguments.

Extending a ruby class (hash) with new function (recursive_merge)

If I want to recursively merge 2 hashes, I can do so with the following function:
def recursive_merge(a,b)
a.merge(b) {|key,a_item,b_item| recursive_merge(a_item,b_item) }
end
This works great, in that I can now do:
aHash = recursive_merge(aHash,newHash)
But I'd like to add this as a self-updating style method similar to merge!. I can add in the returning function:
class Hash
def recursive_merge(newHash)
self.merge { |key,a_item,b_item| a_item.recursive_merge(b_item) }
end
end
But am not sure how to re-create the bang function that updates the original object without association.
class Hash
def recursive_merge!(newHash)
self.merge { |key,a_item,b_item| a_item.recursive_merge(b_item) }
# How do I set "self" to this new hash?
end
end
edit example as per comments.
h={:a=>{:b => "1"}
h.recursive_merge!({:a=>{:c=>"2"})
=> {:a=>{:b=>"1", :c="2"}}
The regular merge results in :b=>"1" being overwritten by :c="2"
Use merge! rather than attempt to update self. I don't believe it makes sense to use merge! anywhere but at the top level, so I wouldn't call the bang version recursively. Instead, use merge! at the top level, and call the non-bang method recursively.
It may also be wise to check both values being merged are indeed hashes, otherwise you may get an exception if you attempt to recursive_merge on a non-hash object.
#!/usr/bin/env ruby
class Hash
def recursive_merge(other)
self.merge(other) { |key, value1, value2| value1.is_a?(Hash) && value2.is_a?(Hash) ? value1.recursive_merge(value2) : value2}
end
def recursive_merge!(other)
self.merge!(other) { |key, value1, value2| value1.is_a?(Hash) && value2.is_a?(Hash) ? value1.recursive_merge(value2) : value2}
end
end
h1 = { a: { b:1, c:2 }, d:1 }
h2 = { a: { b:2, d:4 }, d:2 }
h3 = { d: { b:1, c:2 } }
p h1.recursive_merge(h2) # => {:a=>{:b=>2, :c=>2, :d=>4}, :d=>2}
p h1.recursive_merge(h3) # => {:a=>{:b=>1, :c=>2}, :d=>{:b=>1, :c=>2}}
p h1.recursive_merge!(h2) # => {:a=>{:b=>2, :c=>2, :d=>4}, :d=>2}
p h1 # => {:a=>{:b=>2, :c=>2, :d=>4}, :d=>2}
If you have a specific reason to fully merge in place, possibly for speed, you can experiment with making the second function call itself recursively, rather than delegate the recursion to the first function. Be aware that may produce unintended side effects if the hashes store shared objects.
Example:
h1 = { a:1, b:2 }
h2 = { a:5, c:9 }
h3 = { a:h1, b:h2 }
h4 = { a:h2, c:h1 }
p h3.recursive_merge!(h4)
# Making recursive calls to recursive_merge
# => {:a=>{:a=>5, :b=>2, :c=>9}, :b=>{:a=>5, :c=>9}, :c=>{:a=>1, :b=>2}}
# Making recursive calls to recursive_merge!
# => {:a=>{:a=>5, :b=>2, :c=>9}, :b=>{:a=>5, :c=>9}, :c=>{:a=>5, :b=>2, :c=>9}}
As you can see, the second (shared) copy of h1 stored under the key :c is updated to reflect the merge of h1 and h2 under the key :a. This may be surprising and unwanted. Hence why I recommend using recursive_merge for the recursion, and not recursive_merge!.

Passing a method that takes arguments as an argument in Ruby

I'd like to calculate the difference for various values inside 2 hashes with the same structure, as concisely as possible. Here's a simplified example of the data I'd like to compare:
hash1 = {"x" => { "y" => 20 } }
hash2 = {"x" => { "y" => 12 } }
I have a very simple method to get the value I want to compare. In reality, the hash can be nested a lot deeper than these examples, so this is mostly to keep the code readable:
def get_y(data)
data["x"]["y"]
end
I want to create a method that will calculate the difference between the 2 values, and can take a method like get_y as an argument, allowing me to re-use the code for any value in the hash. I'd like to be able to call something like this, and I'm not sure how to write the method get_delta:
get_delta(hash1, hash2, get_y) # => 8
The "Ruby way" would be to pass a block:
def get_delta_by(obj1, obj2)
yield(obj1) - yield(obj2)
end
hash1 = {"x" => { "y" => 20 } }
hash2 = {"x" => { "y" => 12 } }
get_delta_by(hash1, hash2) { |h| h["x"]["y"] }
#=> 8
A method could be passed (indirectly) via:
def get_y(data)
data["x"]["y"]
end
get_delta_by(hash1, hash2, &method(:get_y))
#=> 8
Building on Stefan's response, if you want a more flexible get method you can actually return a lambda from the function and pass arguments for what you want to get. This will let you do error handling nicely:
Starting with the basics from above...
def get_delta_by(obj1, obj2)
yield(obj1) - yield(obj2)
end
hash1 = {"x" => { "y" => 20 } }
hash2 = {"x" => { "y" => 12 } }
get_delta_by(hash1, hash2) { |h| h["x"]["y"] }
Then we can define a get_something function which takes a list of arguments for the path of the element to get:
def get_something(*args)
lambda do |data|
args.each do |arg|
begin
data = data.fetch(arg)
rescue KeyError
raise RuntimeError, "KeyError for #{arg} on path #{args.join(',')}"
end
end
return data
end
end
Finally we call the function using the ampersand to pass the lambda as a block:
lambda_getter = get_something("x","y")
get_delta_by(hash1, hash2, &lambda_getter)
That last bit can be a one liner... but wrote it as two for clarity here.
In Ruby 2.3, you can use Hash#dig method, if it meets your needs.
hash1.dig("x", "y") - hash2.dig("x", "y")
#=> 8

Easy way to parse hashes and arrays

Typically, parsing XML or JSON returns a hash, array, or combination of them. Often, parsing through an invalid array leads to all sorts of TypeErrors, NoMethodErrors, unexpected nils, and the like.
For example, I have a response object and want to find the following element:
response['cars'][0]['engine']['5L']
If response is
{ 'foo' => { 'bar' => [1, 2, 3] } }
it will throw a NoMethodError exception, when all I want is to see is nil.
Is there a simple way to look for an element without resorting to lots of nil checks, rescues, or Rails try methods?
Casper was just before me, he used the same idea (don't know where i found it, is a time ago) but i believe my solution is more sturdy
module DeepFetch
def deep_fetch(*keys, &fetch_default)
throw_fetch_default = fetch_default && lambda {|key, coll|
args = [key, coll]
# only provide extra block args if requested
args = args.slice(0, fetch_default.arity) if fetch_default.arity >= 0
# If we need the default, we need to stop processing the loop immediately
throw :df_value, fetch_default.call(*args)
}
catch(:df_value){
keys.inject(self){|value,key|
block = throw_fetch_default && lambda{|*args|
# sneak the current collection in as an extra block arg
args << value
throw_fetch_default.call(*args)
}
value.fetch(key, &block) if value.class.method_defined? :fetch
}
}
end
# Overload [] to work with multiple keys
def [](*keys)
case keys.size
when 1 then super
else deep_fetch(*keys){|key, coll| coll[key]}
end
end
end
response = { 'foo' => { 'bar' => [1, 2, 3] } }
response.extend(DeepFetch)
p response.deep_fetch('cars') { nil } # nil
p response.deep_fetch('cars', 0) { nil } # nil
p response.deep_fetch('foo') { nil } # {"bar"=>[1, 2, 3]}
p response.deep_fetch('foo', 'bar', 0) { nil } # 1
p response.deep_fetch('foo', 'bar', 3) { nil } # nil
p response.deep_fetch('foo', 'bar', 0, 'engine') { nil } # nil
I tried to look through both the Hash documentation and also through Facets, but nothing stood out as far as I could see.
So you might want to implement your own solution. Here's one option:
class Hash
def deep_index(*args)
args.inject(self) { |e,arg|
break nil if e[arg].nil?
e[arg]
}
end
end
h1 = { 'cars' => [{'engine' => {'5L' => 'It worked'}}] }
h2 = { 'foo' => { 'bar' => [1, 2, 3] } }
p h1.deep_index('cars', 0, 'engine', '5L')
p h2.deep_index('cars', 0, 'engine', '5L')
p h2.deep_index('foo', 'bonk')
Output:
"It worked"
nil
nil
If you can live with getting an empty hash instead of nil when there is no key, then you can do it like this:
response.fetch('cars', {}).fetch(0, {}).fetch('engine', {}).fetch('5L', {})
or save some types by defining a method Hash#_:
class Hash; def _ k; fetch(k, {}) end end
response._('cars')._(0)._('engine')._('5L')
or do it at once like this:
["cars", 0, "engine", "5L"].inject(response){|h, k| h.fetch(k, {})}
For the sake of reference, there are several projects i know of that tackle the more general problem of chaining methods in the face of possible nils:
andand
ick
zucker's egonil
methodchain
probably others...
There's also been considerable discussion in the past:
Ruby nil-like object - One of many on SO
Null Objects and Falsiness - Great article by Avdi Grimm
The 28 Bytes of Ruby Joy! - Very interesting discussion following J-_-L's post
More idiomatic way to avoid errors when calling method on variable that may be nil? on ruby-talk
et cetera
Having said that, the answers already provided probably suffice for the more specific problem of chained Hash#[] access.
I would suggest an approach of injecting custom #[] method to instances we are interested in:
def weaken_checks_for_brackets_accessor inst
inst.instance_variable_set(:#original_get_element_method, inst.method(:[])) \
unless inst.instance_variable_get(:#original_get_element_method)
singleton_class = class << inst; self; end
singleton_class.send(:define_method, :[]) do |*keys|
begin
res = (inst.instance_variable_get(:#original_get_element_method).call *keys)
rescue
end
weaken_checks_for_brackets_accessor(res.nil? ? inst.class.new : res)
end
inst
end
Being called on the instance of Hash (Array is OK as all the other classes, having #[] defined), this method stores the original Hash#[] method unless it is already substituted (that’s needed to prevent stack overflow during multiple calls.) Then it injects the custom implementation of #[] method, returning empty class instance instead of nil/exception. To use the safe value retrieval:
a = { 'foo' => { 'bar' => [1, 2, 3] } }
p (weaken_checks_for_brackets_accessor a)['foo']['bar']
p "1 #{a['foo']}"
p "2 #{a['foo']['bar']}"
p "3 #{a['foo']['bar']['ghgh']}"
p "4 #{a['foo']['bar']['ghgh'][0]}"
p "5 #{a['foo']['bar']['ghgh'][0]['olala']}"
Yielding:
#⇒ [1, 2, 3]
#⇒ "1 {\"bar\"=>[1, 2, 3]}"
#⇒ "2 [1, 2, 3]"
#⇒ "3 []"
#⇒ "4 []"
#⇒ "5 []"
Since Ruby 2.3, the answer is dig

RoR different bracket notation

I'm getting to grips with rails and whilst I feel I am progressing there is one thing that I am struggling to get to grips with and it's very basic. I am trying to understand the different usage of [] {} and () Are there any good sources of their usage and are there any tips you can give to a beginner in recognizing when to use one or the other, or as I seem to see in some cases when they are not required at all?
I know this is extremely basic but I have struggled to find literature which explains concisely the interplay between them and Ruby or specifically RoR
It has nothing to do with RoR; the various brackets are Ruby language constructs.
[] is the array operator, for arrays and other classes that implement it (like a string taking a range to get substrings, or hashes to look up a key's value):
a = [1, 2, 3]
a.each { |n| puts n }
s = "ohai"
puts s[1..-1]
h = { foo: "bar", baz: "plugh" }
puts h[:foo]
{} is for hashes, and one of two ways of delimiting blocks (the other being begin/end). (And used with # for string interpolation.)
h = { foo: "bar", baz: "plugh" }
h.each { |k, v| puts "#{k} == #{v}" }
() is for method parameters, or for enforcing evaluation order in an expression.
> puts 5 * 3 + 5 # Normal precedence has * ahead of +
=> 20
> puts 5 * (3 + 5) # Force 3+5 to be evaluated first
=> 40
def foo(s)
puts(s)
end
They're sometimes optional if the statement has no ambiguity:
def foo s
puts s
end
(They're not always optional, and putting a space between the method call and its parenthetical parameter list can cause issues--best not to, IMO.)
(I probably missed something, too, but there's the nutshell.)
[] are used to access objects within a hash (via a key) or within an array (via an index).
hash[:key] # returns a value
array[0] # returns the first array element
[] is used to describe an array.
array = ['a', 'b', 'c']
Of course this can be nested.
nested = [['a','b','c'], [1,2,3]]
[] can be used to declare a hash, but that's because the Hash class can accept an array.
hash = Hash[['a',1], ['b',2]] # { 'a' => 1, 'b', => 2 }
{} is used to declare a hash.
hash = { 'a' => 1, 'b' => 2 }
This too can be nested.
hash = { 'a' => { 'c' => 3 }, 'b' => { 'd' => 4 } }
{} is also used to delimit blocks. The .each method is a common one. The following two blocks of code are equivalent.
array.each do |n|
puts n
end
array.each { |n| puts n }
The () is just used for grouping in cases where ambiguity needs clarification. This is especially true in methods that take many arguments, some of which may be nil, some of which may be obejcts, etc. You'll see a lot of code that omit them entirely as no grouping is needed for clarity.
puts(string)
puts string
I recommend firing up the rails console and start declaring variables and accessing them.

Resources