ruby hash merge with a block - ruby

Trying to use ruby Hash merge! on multiple hashes, starting with an empty hash
a = {}
b = {x: 1.2, y: 1.3}
c = {x: 1.4, y: 1.5}
fact = 100 # need to multiply values that are merged in with this
a.merge!(b) {|k,v1,v2| v1 + v2 * fact} # it doesn't multiply values with fact
a.merge!(c) {|k,v1,v2| v1 + v2 * fact} #it does multiply values with fact
So first merge does not give me result I was expecting, while the second merge does. Please note that in real app keys are not limited to x and y, there can be many different keys.

The first merge works as described in the documentation.
The block is invoked only to solve conflicts, when a key is present in both hashes. On the first call to Hash#merge!, a is empty, hence no conflict occurred and the content of b is copied into a without any changes.
You can fix the code by initializing a with {x: 0, y: 0}.

I would be inclined to perform the merge as follows.
a = {}
b = {x: 1.2, y: 1.3}
c = {x: 1.4, y: 1.5}
[b, c].each_with_object(a) { |g,h| h.update(g) { |_,o,n| o+n } }.
tap { |h| h.keys.each { |k| h[k] *= 10 } }
#=> {:x=>25.999999999999996, :y=>28.0}
Note that this works with any number of hashes (b, c, d, ...) and any number of keys ({ x: 1.2, y: 1.3, z: 2.1, ... }`).
The steps are as follows1.
e = [b, c].each_with_object(a)
#=> #<Enumerator: [{:x=>1.2, :y=>1.3}, {:x=>1.4, :y=>1.5}]:each_with_object({})>
We can see the values that will be generated by this enumerator by applying Enumerable#entries2:
e.entries
#=> [[{:x=>1.2, :y=>1.3}, {}], [{:x=>1.4, :y=>1.5}, {}]]
We can use Enumerator#next to generate the first value of e and assign the two block variables to it (that is, "pass e.next to the block"):
g,h = e.next
#=> [{:x=>1.2, :y=>1.3}, {}]
g #=> {:x=>1.2, :y=>1.3}
h #=> {}
Next we perform the block calculation.
f = h.update(g) { |_,o,n| o+n }
#=> {:x=>1.2, :y=>1.3}
Here I have used the form of Hash.update (aka merge!) which employs a block to determine the values of keys that are present in both hashes being merged. (See the doc for details.) As h is now empty (no keys), the block is not used for this merge.
The next and last value of e is now generated and the process is repeated.
g,h = e.next
#=> [{:x=>1.4, :y=>1.5}, {:x=>1.2, :y=>1.3}]
g #=> {:x=>1.4, :y=>1.5}
h #=> {:x=>1.2, :y=>1.3}
f = h.update(g) { |_,o,n| o+n }
#=> {:x=>2.5999999999999996, :y=>2.8}
Since g and h both have a key :x, the block is used to determine the new value of h[:x]
_ #=> :x
o #=> 1.4
n #=> 1.2
h[:x] = o + n
#=> 2.6
Similarly, h[:y| = 2.8.
The last step uses Object#tap to multiple each value by 10.
f.tap { |g| g.keys.each { |k| h[k] *= 10 } }
#=> {:x=>25.999999999999996, :y=>28.0}
tap does nothing more than save a line of code and the creation of a local variable, as I could have instead written:
h = [b, c].each_with_object(a) { |g,h| h.update(g) { |_,o,n| o+n } }
h.keys.each { |k| h[k] *= 10 }
h
Another option (that does not use tap) is to write:
f = [b, c].flat_map(&:keys).uniq.product([0]).to_h
#=> {:x=>0, :y=>0}
[b, c].each_with_object(f) { |g,h| h.update(g) { |_,o,n| o+10*n } }
#=> {:x=>26.0, :y=>28.0}
1 Experienced Rubiests: GORY DETAIL ALERT!
2 Hash#to_a could also be used here.

Related

When is a block or object that is passed to Hash.new created or run?

I'm going through ruby koans and I am having a little trouble understanding when this code will be run:
hash = Hash.new {|hash, key| hash[key] = [] }
If there are no values in the hash, when does the new array get assigned to a given key in the Hash? Does it happen the first time a hash value is accessed without first assigning it? Please help me understand when exactly default values are created for any given hash key.
For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.
The task
Suppose you are given an array
arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]
and wish to to create the hash
{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }
First solution
h = {}
arr.each do |k,v|
h[k] = [] unless h.key?(k)
h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
This is quite straightforward.
Second solution
More Ruby-like is to write:
h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
When Ruby sees (h[k] ||= []) << v the first thing she does is expand it to
(h[k] = h[k] || []) << v
If h does not have a key k, h[k] #=> nil, so the expression becomes
(h[k] = nil || []) << v
which becomes
(h[k] = []) << v
so
h[k] #=> [v]
Note that h[k] on the left of equality uses the method Hash#[]=, whereas h[k] on the right employs Hash#[].
This solution requires that none of the hash values equal nil.
Third solution
A third approach is to give the hash a default value. If a hash h does not have a key k, h[k] returns the default value. There are two types of default values.
Passing the default value as an argument to Hash::new
If an empty array is passed as an argument to Hash::new, that value becomes the default value:
a = []
a.object_id
#=> 70339916855860
g = Hash.new(a)
#=> {}
g[k] returns [] when h does not have a key k. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we write
x = g[:cat] << "bo"
#=> ["bo"]
y = g[:dog] << "diva"
#=> ["bo", "diva"]
x #=> ["bo", "diva"]
This is because the values of :cat and :dog are both set equal to the same object, an empty array. We can see this by examining object_ids:
x.object_id
#=> 70339916855860
y.object_id
#=> 70339916855860
Giving Hash::new a block which returns the default value
The second form of default value is to perform a block calculation. If we define the hash with a block:
h = Hash.new { |h,k| h[key] = [] }
then if h does not have a key k, h[k] will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h is the newly-created empty hash. This allows us to write
h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
As the first element passed to the block is arr.first, the block variables are assigned values by evaluating
k, v = arr.first
#=> [:dog, "fido"]
k #=> :dog
v #=> "fido"
The block calculation is therefore
h[k] << v
#=> h[:dog] << "fido"
but since h does not (yet) have a key :dog, the block is triggered, setting h[k] equal to [] and then that empty array is appended with "fido", so that
h #=> { :dog=>["fido"] }
Similarly, after the next two elements of arr are passed to the block we have
h #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }
When the next (fourth) element of arr is passed to the block, we evaluate
h[:dog] << "diva"
but now h does have a key, so the default does not apply and we end up with
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]}
The last element of arr is processed similarly.
Note that, when using Hash::new with a block, we could write something like this:
h = Hash.new { launch_missiles("any time now") }
in which case h[k] would be set equal to the return value of launch_missiles. In other words, anything can be done in the block.
Even more Ruby-like
Lastly, the more Ruby-like way of writing
h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
is to use Enumerable#each_with_object:
arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
#=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
which eliminates two lines of code.
Which is best?
Personally, I am indifferent to the second and third solutions. Both are used in practice.
The block is called when you add a new key to the hash. In that specific case:
hash["d"] #calls the block and store [] as a value of "d" key
hash["d"] #should print []
For more information, visit: https://docs.ruby-lang.org/en/2.0.0/Hash.html
If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block's responsibility to store the value in the hash if required.
Makes life easier
This is syntactic sugar for those times that you have a hash whose values are all arrays and you don't want to check each time to see if the hash key is already there and the empty array is already initialized before adding new elements. It allows this:
hash[:new_key] << new_element
instead of this:
hash[:new_key] = [] unless hash[:new_key]
hash[:new_key] << new_element
Solves an older problem
It's also an alternative to the simpler way of specifying a default value for hashes, which looks like this:
hash = Hash.new([])
The problem with this approach is that the same array object is used as the default for all keys. So
hash = Hash.new([])
hash[:a] << 1
hash[:b] << 2
will return [1, 2] for either hash[:a] or hash[:b], or even hash[:foo] for that matter. Which is not usually the desired/expected behavior.

ruby syntax code involving hashes

I was looking at code regarding how to return a mode from an array and I ran into this code:
def mode(array)
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
answer.select { |k,v| v == answer.values.max}.keys
end
I'm trying to conceptualize what the syntax means behind it as I am fairly new to Ruby and don't exactly understand how hashes are being used here. Any help would be greatly appreciated.
Line by line:
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
This assembles a hash of counts. I would not have called the variable answer because it is not the answer, it is an intermediary step. The inject() method (also known as reduce()) allows you to iterate over a collection, keeping an accumulator (e.g. a running total or in this case a hash collecting counts). It needs a starting value of {} so that the hash exists when attempting to store a value. Given the array [1,2,2,2,3,4,5,6,6] the counts would look like this: {1=>1, 2=>3, 3=>1, 4=>1, 5=>1, 6=>2}.
answer.select { |k,v| v == answer.values.max}.keys
This selects all elements in the above hash whose value is equal to the maximum value, in other words the highest. Then it identifies the keys associated with the maximum values. Note that it will list multiple values if they share the maximum value.
An alternative:
If you didn't care about returning multiple, you could use group_by as follows:
array.group_by{|x|x}.values.max_by(&:size).first
or, in Ruby 2.2+:
array.group_by{&:itself}.values.max_by(&:size).first
The inject method acts like an accumulator. Here is a simpler example:
sum = [1,2,3].inject(0) { |current_tally, new_value| current_tally + new_value }
The 0 is the starting point.
So after the first line, we have a hash that maps each number to the number of times it appears.
The mode calls for the most frequent element, and that is what the next line does: selects only those who are equal to the maximum.
I believe your question has been answered, and #Mark mentioned different ways to do the calculations. I would like to just focus on other ways to improve the first line of code:
answer = array.inject ({}) { |k, v| k[v] = array.count(v); k }
First, let's create some data:
array = [1,2,1,4,3,2,1]
Use each_with_object instead of inject
My suspicion is that the code might be fairly old, as Enumerable#each_with_object, which was introduced in v. 1.9, is arguably a better choice here than Enumerable#inject (aka reduce). If we were to use each_with_object, the first line would be:
answer = array.each_with_object ({}) { |v,k| k[v] = array.count(v) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
each_with_object returns the object, a hash held by the block variable v.
As you see, each_with_object is very similar to inject, the only differences being:
it is not necessary to return v from the block to each_with_object, as it is with inject (the reason for that annoying ; v at the end of inject's block);
the block variable for the object (k) follows v with each_with_object, whereas it proceeds v with inject; and
when not given a block, each_with_object returns an enumerator, meaning it can be chained to other other methods (e.g., arr.each_with_object.with_index ....
Don't get me wrong, inject remains an extremely powerful method, and in many situations it has no peer.
Two more improvements
In addition to replacing inject with each_with_object, let me make two other changes:
answer = array.uniq.each_with_object ({}) { |k,h| h[k] = array.count(k) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
In the original expression, the object returned by inject (sometimes called the "memo") was represented by the block variable k, which I am using to represent a hash key ("k" for "key"). Simlarly, as the object is a hash, I chose to use h for its block variable. Like many others, I prefer to keep the block variables short and use names that indicate object type (e.g., a for array, h for hash, s for string, sym for symbol, and so on).
Now suppose:
array = [1,1]
then inject would pass the first 1 into the block and then compute k[1] = array.count(1) #=> 2, so the hash k returned to inject would be {1=>2}. It would then pass the second 1 into the block, again compute k[1] = array.count(1) #=> 2, overwriting 1=>1 in k with 1=>1; that is, not changing it at all. Doesn't it make more sense to just do this for the unique values of array? That's why I have: array.uniq....
Even better: use a counting hash
This is still quite inefficient--all those counts. Here's a way that reads better and is probably more efficient:
array.each_with_object(Hash.new(0)) { |k,h| h[k] += 1 }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
Let's have a look at this in gory detail. Firstly, the docs for Hash#new read, "If obj is specified [i.e., Hash.new(obj)], this single object will be used for all default values." This means that if:
h = Hash.new('cat')
and h does not have a key dog, then:
h['dog'] #=> 'cat'
Important: The last expression is often misunderstood. It merely returns the default value. str = "It does *not* add the key-value pair 'dog'=>'cat' to the hash." Let me repeat that: puts str.
Now let's see what's happening here:
enum = array.each_with_object(Hash.new(0))
#=> #<Enumerator: [1, 2, 1, 4, 3, 2, 1]:each_with_object({})>
We can see the contents of the enumerator by converting it to an array:
enum.to_a
#=> [[1, {}], [2, {}], [1, {}], [4, {}], [3, {}], [2, {}], [1, {}]]
These seven elements are passed into the block by the method each:
enum.each { |k,h| h[k] += 1 }
=> {1=>3, 2=>2, 4=>1, 3=>1}
Pretty cool, eh?
We can simulate this using Enumerator#next. The first value of enum ([1, {}]) is passed to the block and assigned to the block variables:
k,h = enum.next
#=> [1, {}]
k #=> 1
h #=> {}
and we compute:
h[k] += 1
#=> h[k] = h[k] + 1 (what '+=' means)
# = 0 + 1 = 1 (h[k] on the right equals the default value
# of 1 since `h` has no key `k`)
so now:
h #=> {1=>1}
Next, each passes the second value of enum into the block and similar calculations are performed:
k,h = enum.next
#=> [2, {1=>1}]
k #=> 2
h #=> {1=>1}
h[k] += 1
#=> 1
h #=> {1=>1, 2=>1}
Things are a little different when the third element of enum is passed in, because h now has a key 1:
k,h = enum.next
#=> [1, {1=>1, 2=>1}]
k #=> 1
h #=> {1=>1, 2=>1}
h[k] += 1
#=> h[k] = h[k] + 1
#=> h[1] = h[1] + 1
#=> h[1] = 1 + 1 => 2
h #=> {1=>1, 2=>1}
The remaining calculations are performed similarly.

How to find duplicate elements in a string array and store the result in another array

I have a string array:
Array1 = ["ab", "cd", "ab", "cd", "ef"]
I am trying to find all duplicate elements in Array1 and store them in another array Array2. In this case "ab" and "cd" should be stored in Array2.
I tried this:
Array2 = Array1.detect{ |c| Array1.count(c) > 1 }
But it returns only the first duplicate element.
This is one way.
arr = ["ab", "cd", "ab", "cd", "ef"]
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
.select { |_,v| v > 1 }
.keys
#=> ["ab", "cd"]
We first use Enumerable#each_with_object to compute:
g = arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
#=> {"ab"=>2, "cd"=>2, "ef"=>1}
then Hash#select (not Enumerable#select) to obtain:
h = g.select { |_,v| v > 1 }
#=> {"ab"=>2, "cd"=>2}
and lastly, Hash#keys to extract the keys from h:
h.keys
#=> ["ab", "cd"]
Another way is to use a set:
require 'set'
s = Set.new
arr.select { |e| !s.add?(e) }.uniq
#=> ["ab", "cd"]
Set#add? attempts to add the value of e to the set. If it is able to do so, it returns self (which evaluates true); if it's unable to do so (because the value of e is already in the set, implying that there are at least two elements of arr with the value of e), it returns nil. In the later case we want to select that element of arr, so we write !s.add?(e), which evaluates true.
array2 = array1.select { |item| array1.count(item) > 1 }.uniq
You can use #each_with_object method :
array2 = array1.uniq
.each_with_object([]) { |e, a| a << e if array1.count(e) }

How to select unique elements

I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq. For example:
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]
Example with closure:
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements{|z| z.round} # => [2.0, 5.1]
Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don't care of speed, I call it only once in my program, so it can be a slow.
Helper method
This method uses the helper:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
This method is similar to Array#-. The difference is illustrated in the following example:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a - b #=> [1]
c = a.difference b #=> [1, 3, 2, 2]
As you see, a contains three 3's and b contains two, so the first two 3's in a are removed in constructing c (a is not mutated). When b contains as least as many instances of an element as does a, c contains no instances of that element. To remove elements beginning at the end of a:
a.reverse.difference(b).reverse #=> [3, 1, 2, 2]
Array#difference! could be defined in the obvious way.
I have found many uses for this method: here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and here.
I have proposed that this method be added to the Ruby core.
When used with Array#-, this method makes it easy to extract the unique elements from an array a:
a = [1,3,2,4,3,4]
u = a.uniq #=> [1, 2, 3, 4]
u - a.difference(u) #=> [1, 2]
This works because
a.difference(u) #=> [3,4]
contains all the non-unique elements of a (each possibly more than once).
Problem at Hand
Code
class Array
def uniq_elements(&prc)
prc ||= ->(e) { e }
a = map { |e| prc[e] }
u = a.uniq
uniques = u - a.difference(u)
select { |e| uniques.include?(prc[e]) ? (uniques.delete(e); true) : false }
end
end
Examples
t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements
#=> [1,3,5,6,8]
t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round }
# => [2.0, 5.1]
Here's another way.
Code
require 'set'
class Array
def uniq_elements(&prc)
prc ||= ->(e) { e }
uniques, dups = {}, Set.new
each do |e|
k = prc[e]
((uniques.key?(k)) ? (dups << k; uniques.delete(k)) :
uniques[k] = e) unless dups.include?(k)
end
uniques.values
end
end
Examples
t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements #=> [1,3,5,6,8]
t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round } # => [2.0, 5.1]
Explanation
if uniq_elements is called with a block, it is received as the proc prc.
if uniq_elements is called without a block, prc is nil, so the first statement of the method sets prc equal to the default proc (lambda).
an initially-empty hash, uniques, contains representations of the unique values. The values are the unique values of the array self, the keys are what is returned when the proc prc is passed the array value and called: k = prc[e].
the set dups contains the elements of the array that have found to not be unique. It is a set (rather than an array) to speed lookups. Alternatively, if could be a hash with the non-unique values as keys, and arbitrary values.
the following steps are performed for each element e of the array self:
k = prc[e] is computed.
if dups contains k, e is a dup, so nothing more needs to be done; else
if uniques has a key k, e is a dup, so k is added to the set dups and the element with key k is removed from uniques; else
the element k=>e is added to uniques as a candidate for a unique element.
the values of unique are returned.
class Array
def uniq_elements
counts = Hash.new(0)
arr = map do |orig_val|
converted_val = block_given? ? (yield orig_val) : orig_val
counts[converted_val] += 1
[converted_val, orig_val]
end
uniques = []
arr.each do |(converted_val, orig_val)|
uniques << orig_val if counts[converted_val] == 1
end
uniques
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p t.uniq_elements { |elmt| elmt.round }
--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]
Array#uniq does not find non-duplicated elements, rather Array#uniq removes duplicates.
Use Enumerable#tally:
class Array
def uniq_elements
tally.select { |_obj, nb| nb == 1 }.keys
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]
If you are using Ruby < 2.7, you can get tally with the backports gem
require 'backports/2.7.0/enumerable/tally'
class Array
def uniq_elements
zip( block_given? ? map { |e| yield e } : self )
.each_with_object Hash.new do |(e, v), h| h[v] = h[v].nil? ? [e] : false end
.values.reject( &:! ).map &:first
end
end
[1,2,2,3,4,4,5,6,7,7,8,9,9,9].uniq_elements #=> [1, 3, 5, 6, 8]
[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2].uniq_elements &:round #=> [2.0, 5.1]
Creating and calling a default proc is a waste of time, and
Cramming everything into one line using tortured constructs doesn't make the code more efficient--it just makes the code harder to understand.
In require statements, rubyists don't capitalize file names.
....
require 'set'
class Array
def uniq_elements
uniques = {}
dups = Set.new
each do |orig_val|
converted_val = block_given? ? (yield orig_val) : orig_val
next if dups.include? converted_val
if uniques.include?(converted_val)
uniques.delete(converted_val)
dups << converted_val
else
uniques[converted_val] = orig_val
end
end
uniques.values
end
end
t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements
t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p t.uniq_elements {|elmt|
elmt.round
}
--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]

Syntax for Block Local Variables

I am confused about a good style to adopt to define block local variables. The choices are:
Choice A:
method_that_calls_block { |v, w| puts v, w }
Choice B:
method_that_calls_block { |v; w| puts v, w }
The confusion is compunded when I want the block local to have a default value. The choices I am confused about are:
Choice C:
method_that_calls_block { |v, w = 1| puts v, w }
Choice D:
method_that_calls_block { |v, w: 1| puts v, w }
Is there a convention about how block local variables must be defined?
P.S. Also it seems the ; syntax does not work when I need to assign default value to a block local variable! Strange.
Choice B is not valid. As #matt indicated - it is a valid (though obscure) syntax (see here: How to write an inline block to contain local variable scope in Ruby?)
Choice C gives a default value to w, which is a regular value, while Choice D is a syntax for default keyword argument.
All four of these are valid, but they all have different semantics -- which is correct depends on what you're trying to accomplish.
Examples
Consider the following method, which yields multiple values.
def frob
yield 1, 2, 3
end
Choice A: block parameters
"Get me the first two yielded values, if any, I don't care about the others."
frob { |v, w| [v, w].inspect}
# => "[1, 2]"
Choice B: block parameter + block-local variable
"Get me the first value, I don't care about the others; and give me an additional, uninitialized variable".
frob { |v; w| [v, w].inspect}
# => "[1, nil]"
Choice C: block parameters, some with default values
"Get me the first two values, and if the second value isn't initialized, set that variable to 1":
frob { |v, w = 1| [v, w].inspect }
# => "[1, 2]" <-- all values are present, default value ignored
"Get me the first five values, and if the fifth value isn't initialized, set that variable to 99":
frob { |v, w, x, y, z = 99| [v, w, x, y, z].inspect }
# => "[1, 2, 3, nil, 99]"
Choice D: positional and keyword block parameters
"Get me the first value, and if the method yields a keyword parameter w, get that, too; if not, set it to 1."
frob { |v, w: 1| [v, w].inspect }
# => "[1, 1]"
This is designed for the case where a method does yield block parameters:
def frobble
yield 1, 2, 3, w: 4
end
frobble { |v, w: 1| [v, w].inspect }
# => "[1, 4]"
In Ruby < 2.7, a block with a keyword parameter will also destructure a hash, although Ruby 2.7 will give you a deprecation warning, just as if you'd passed a hash to a method that takes keyword arguments:
def frobnitz
h = {w: 99}
yield 1, 2, 3, h
end
# Ruby 2.7
frobnitz { |v, w: 1| [v, w].inspect }
# warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
# => "[1, 99]"
Ruby 3.0 doesn't give you a deprecation warning, but it also ignores the hash:
# Ruby 3.0
frobnitz { |v, w: 1| [v, w].inspect }
# => [1, 1]
Yielding an explicit keyword argument still works as expected in 3.0, though:
# Ruby 3.0
frobble { |v, w: 1| [v, w].inspect }
# => "[1, 4]"
Note that the keyword argument form will fail if the method yields unexpected keywords:
def frobnicate
yield 1, 2, 3, w: 99, z: -99
end
frobnicate { |v, w: 1| [v, w].inspect }
# => ArgumentError (unknown keyword: :z)
Array destructuring
Another way in which the differences become obvious is when considering a method that returns an array:
def gork
yield [1, 2, 3]
end
Passing a block with a single argument will get you the whole array:
gork { |v| v.inspect }
# => "[1, 2, 3]"
Passing a block with multiple arguments, though, will get you the elements of the array, even if you pass too few arguments, or too many:
gork { |v, w| [v, w].inspect }
# "[1, 2]"
gork { |v, w, x, y| [v, w, x, y].inspect }
# => "[1, 2, 3, nil]"
Here again the ; syntax for block-local variables can come in handy:
gork { |v; w| [v, w].inspect }
# => "[[1, 2, 3], nil]"
Note, though, that even a keyword argument will still cause the array to be destructured:
gork { |v, w: 99| [v, w].inspect }
# => "[1, 99]"
gork { |v, w: 99; x| [v, w, x].inspect }
# => "[1, 99, nil]"
Outer variable shadowing
Ordinarily, if you use the name of an outer variable inside a block, you're using that variable:
w = 1; frob { |v| w = 99}; w
# => 99
You can avoid this with any of the choices above; any of them will shadow the outer variable, hiding the outer variable from the block and ensuring that any effects the block has on it are local.
Choice A: block parameters:
w = 1; frob { |v, w| puts [v, w].inspect; w = 99}; w
# [1, 2]
# => 1
Choice B: block parameter + block-local variable
w = 1; frob { |v; w| puts [v, w].inspect; w = 99}; w
# [1, nil]
# => 1
Choice C: block parameters, some with default values
w = 1; frob { |v, w = 33| puts [v, w].inspect; w = 99}; w
# [1, 2]
# => 1
Choice D: positional and keyword block parameters
w = 1; frob { |v, w: 33| puts [v, w].inspect; w = 99}; w
# [1, 33]
# => 1
The other behavioral differences, though, still hold.
Default values
You can't set a default value for block-local variables.
frob { |v; w = 1| [v, w].inspect }
# syntax error, unexpected '=', expecting '|'
You also can't use a keyword argument as a block parameter.
frob { |v; w: 1| [v, w].inspect }
# syntax error, unexpected ':', expecting '|'
If you know the method you're calling doesn't yield a block parameter, though, you can declare a fake block parameter with a default value, and use that to get yourself a pre-initialized block-local variable. Repeated from the first Choice D example, above:
frob { |v, w: 1| [v, w].inspect }
# => "[1, 1]"

Resources