I am confused about a good style to adopt to define block local variables. The choices are:
Choice A:
method_that_calls_block { |v, w| puts v, w }
Choice B:
method_that_calls_block { |v; w| puts v, w }
The confusion is compunded when I want the block local to have a default value. The choices I am confused about are:
Choice C:
method_that_calls_block { |v, w = 1| puts v, w }
Choice D:
method_that_calls_block { |v, w: 1| puts v, w }
Is there a convention about how block local variables must be defined?
P.S. Also it seems the ; syntax does not work when I need to assign default value to a block local variable! Strange.
Choice B is not valid. As #matt indicated - it is a valid (though obscure) syntax (see here: How to write an inline block to contain local variable scope in Ruby?)
Choice C gives a default value to w, which is a regular value, while Choice D is a syntax for default keyword argument.
All four of these are valid, but they all have different semantics -- which is correct depends on what you're trying to accomplish.
Examples
Consider the following method, which yields multiple values.
def frob
yield 1, 2, 3
end
Choice A: block parameters
"Get me the first two yielded values, if any, I don't care about the others."
frob { |v, w| [v, w].inspect}
# => "[1, 2]"
Choice B: block parameter + block-local variable
"Get me the first value, I don't care about the others; and give me an additional, uninitialized variable".
frob { |v; w| [v, w].inspect}
# => "[1, nil]"
Choice C: block parameters, some with default values
"Get me the first two values, and if the second value isn't initialized, set that variable to 1":
frob { |v, w = 1| [v, w].inspect }
# => "[1, 2]" <-- all values are present, default value ignored
"Get me the first five values, and if the fifth value isn't initialized, set that variable to 99":
frob { |v, w, x, y, z = 99| [v, w, x, y, z].inspect }
# => "[1, 2, 3, nil, 99]"
Choice D: positional and keyword block parameters
"Get me the first value, and if the method yields a keyword parameter w, get that, too; if not, set it to 1."
frob { |v, w: 1| [v, w].inspect }
# => "[1, 1]"
This is designed for the case where a method does yield block parameters:
def frobble
yield 1, 2, 3, w: 4
end
frobble { |v, w: 1| [v, w].inspect }
# => "[1, 4]"
In Ruby < 2.7, a block with a keyword parameter will also destructure a hash, although Ruby 2.7 will give you a deprecation warning, just as if you'd passed a hash to a method that takes keyword arguments:
def frobnitz
h = {w: 99}
yield 1, 2, 3, h
end
# Ruby 2.7
frobnitz { |v, w: 1| [v, w].inspect }
# warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
# => "[1, 99]"
Ruby 3.0 doesn't give you a deprecation warning, but it also ignores the hash:
# Ruby 3.0
frobnitz { |v, w: 1| [v, w].inspect }
# => [1, 1]
Yielding an explicit keyword argument still works as expected in 3.0, though:
# Ruby 3.0
frobble { |v, w: 1| [v, w].inspect }
# => "[1, 4]"
Note that the keyword argument form will fail if the method yields unexpected keywords:
def frobnicate
yield 1, 2, 3, w: 99, z: -99
end
frobnicate { |v, w: 1| [v, w].inspect }
# => ArgumentError (unknown keyword: :z)
Array destructuring
Another way in which the differences become obvious is when considering a method that returns an array:
def gork
yield [1, 2, 3]
end
Passing a block with a single argument will get you the whole array:
gork { |v| v.inspect }
# => "[1, 2, 3]"
Passing a block with multiple arguments, though, will get you the elements of the array, even if you pass too few arguments, or too many:
gork { |v, w| [v, w].inspect }
# "[1, 2]"
gork { |v, w, x, y| [v, w, x, y].inspect }
# => "[1, 2, 3, nil]"
Here again the ; syntax for block-local variables can come in handy:
gork { |v; w| [v, w].inspect }
# => "[[1, 2, 3], nil]"
Note, though, that even a keyword argument will still cause the array to be destructured:
gork { |v, w: 99| [v, w].inspect }
# => "[1, 99]"
gork { |v, w: 99; x| [v, w, x].inspect }
# => "[1, 99, nil]"
Outer variable shadowing
Ordinarily, if you use the name of an outer variable inside a block, you're using that variable:
w = 1; frob { |v| w = 99}; w
# => 99
You can avoid this with any of the choices above; any of them will shadow the outer variable, hiding the outer variable from the block and ensuring that any effects the block has on it are local.
Choice A: block parameters:
w = 1; frob { |v, w| puts [v, w].inspect; w = 99}; w
# [1, 2]
# => 1
Choice B: block parameter + block-local variable
w = 1; frob { |v; w| puts [v, w].inspect; w = 99}; w
# [1, nil]
# => 1
Choice C: block parameters, some with default values
w = 1; frob { |v, w = 33| puts [v, w].inspect; w = 99}; w
# [1, 2]
# => 1
Choice D: positional and keyword block parameters
w = 1; frob { |v, w: 33| puts [v, w].inspect; w = 99}; w
# [1, 33]
# => 1
The other behavioral differences, though, still hold.
Default values
You can't set a default value for block-local variables.
frob { |v; w = 1| [v, w].inspect }
# syntax error, unexpected '=', expecting '|'
You also can't use a keyword argument as a block parameter.
frob { |v; w: 1| [v, w].inspect }
# syntax error, unexpected ':', expecting '|'
If you know the method you're calling doesn't yield a block parameter, though, you can declare a fake block parameter with a default value, and use that to get yourself a pre-initialized block-local variable. Repeated from the first Choice D example, above:
frob { |v, w: 1| [v, w].inspect }
# => "[1, 1]"
Related
Trying to use ruby Hash merge! on multiple hashes, starting with an empty hash
a = {}
b = {x: 1.2, y: 1.3}
c = {x: 1.4, y: 1.5}
fact = 100 # need to multiply values that are merged in with this
a.merge!(b) {|k,v1,v2| v1 + v2 * fact} # it doesn't multiply values with fact
a.merge!(c) {|k,v1,v2| v1 + v2 * fact} #it does multiply values with fact
So first merge does not give me result I was expecting, while the second merge does. Please note that in real app keys are not limited to x and y, there can be many different keys.
The first merge works as described in the documentation.
The block is invoked only to solve conflicts, when a key is present in both hashes. On the first call to Hash#merge!, a is empty, hence no conflict occurred and the content of b is copied into a without any changes.
You can fix the code by initializing a with {x: 0, y: 0}.
I would be inclined to perform the merge as follows.
a = {}
b = {x: 1.2, y: 1.3}
c = {x: 1.4, y: 1.5}
[b, c].each_with_object(a) { |g,h| h.update(g) { |_,o,n| o+n } }.
tap { |h| h.keys.each { |k| h[k] *= 10 } }
#=> {:x=>25.999999999999996, :y=>28.0}
Note that this works with any number of hashes (b, c, d, ...) and any number of keys ({ x: 1.2, y: 1.3, z: 2.1, ... }`).
The steps are as follows1.
e = [b, c].each_with_object(a)
#=> #<Enumerator: [{:x=>1.2, :y=>1.3}, {:x=>1.4, :y=>1.5}]:each_with_object({})>
We can see the values that will be generated by this enumerator by applying Enumerable#entries2:
e.entries
#=> [[{:x=>1.2, :y=>1.3}, {}], [{:x=>1.4, :y=>1.5}, {}]]
We can use Enumerator#next to generate the first value of e and assign the two block variables to it (that is, "pass e.next to the block"):
g,h = e.next
#=> [{:x=>1.2, :y=>1.3}, {}]
g #=> {:x=>1.2, :y=>1.3}
h #=> {}
Next we perform the block calculation.
f = h.update(g) { |_,o,n| o+n }
#=> {:x=>1.2, :y=>1.3}
Here I have used the form of Hash.update (aka merge!) which employs a block to determine the values of keys that are present in both hashes being merged. (See the doc for details.) As h is now empty (no keys), the block is not used for this merge.
The next and last value of e is now generated and the process is repeated.
g,h = e.next
#=> [{:x=>1.4, :y=>1.5}, {:x=>1.2, :y=>1.3}]
g #=> {:x=>1.4, :y=>1.5}
h #=> {:x=>1.2, :y=>1.3}
f = h.update(g) { |_,o,n| o+n }
#=> {:x=>2.5999999999999996, :y=>2.8}
Since g and h both have a key :x, the block is used to determine the new value of h[:x]
_ #=> :x
o #=> 1.4
n #=> 1.2
h[:x] = o + n
#=> 2.6
Similarly, h[:y| = 2.8.
The last step uses Object#tap to multiple each value by 10.
f.tap { |g| g.keys.each { |k| h[k] *= 10 } }
#=> {:x=>25.999999999999996, :y=>28.0}
tap does nothing more than save a line of code and the creation of a local variable, as I could have instead written:
h = [b, c].each_with_object(a) { |g,h| h.update(g) { |_,o,n| o+n } }
h.keys.each { |k| h[k] *= 10 }
h
Another option (that does not use tap) is to write:
f = [b, c].flat_map(&:keys).uniq.product([0]).to_h
#=> {:x=>0, :y=>0}
[b, c].each_with_object(f) { |g,h| h.update(g) { |_,o,n| o+10*n } }
#=> {:x=>26.0, :y=>28.0}
1 Experienced Rubiests: GORY DETAIL ALERT!
2 Hash#to_a could also be used here.
Today I was surprised to find ruby automatically find the values of an array given as a block parameter.
For example:
foo = "foo"
bar = "bar"
p foo.chars.zip(bar.chars).map { |pair| pair }.first #=> ["f", "b"]
p foo.chars.zip(bar.chars).map { |a, b| "#{a},#{b}" }.first #=> "f,b"
p foo.chars.zip(bar.chars).map { |a, b,c| "#{a},#{b},#{c}" }.first #=> "f,b,"
I would have expected the last two examples to give some sort of error.
Is this an example of a more general concept in ruby?
I don't think my wording at the start of my question is correct, what do I call what is happening here?
Ruby block are quirky like that.
The rule is like this, if a block takes more than one argument and it is yielded a single object that responds to to_ary then that object is expanded. This makes yielding an array versus yielding a tuple seem to behave the same way for blocks that take two or more arguments.
yield [a,b] versus yield a,b do differ though when the block takes one argument only or when the block takes a variable number of arguments.
Let me demonstrate both of that
def yield_tuple
yield 1, 2, 3
end
yield_tuple { |*a| p a }
yield_tuple { |a| p [a] }
yield_tuple { |a, b| p [a, b] }
yield_tuple { |a, b, c| p [a, b, c] }
yield_tuple { |a, b, c, d| p [a, b, c, d] }
prints
[1, 2, 3]
[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, nil]
Whereas
def yield_array
yield [1,2,3]
end
yield_array { |*a| p a }
yield_array { |a| p [a] }
yield_array { |a, b| p [a, b] }
yield_array { |a, b, c| p [a, b, c] }
yield_array { |a, b, c, d| p [a, b, c, d] }
prints
[[1, 2, 3]]
[[1, 2, 3]]
[1, 2] # array expansion makes it look like a tuple
[1, 2, 3] # array expansion makes it look like a tuple
[1, 2, 3, nil] # array expansion makes it look like a tuple
And finally to show that everything in Ruby uses duck-typing
class A
def to_ary
[1,2,3]
end
end
def yield_arrayish
yield A.new
end
yield_arrayish { |*a| p a }
yield_arrayish { |a| p [a] }
yield_arrayish { |a, b| p [a, b] }
yield_arrayish { |a, b, c| p [a, b, c] }
yield_arrayish { |a, b, c, d| p [a, b, c, d] }
prints
[#<A:0x007fc3c2969190>]
[#<A:0x007fc3c2969050>]
[1, 2] # array expansion makes it look like a tuple
[1, 2, 3] # array expansion makes it look like a tuple
[1, 2, 3, nil] # array expansion makes it look like a tuple
PS, the same array expansion behavior applies for proc closures which behave like blocks, whereas lambda closures behave like methods.
Ruby's block mechanics have a quirk to them, that is if you're iterating over something that contains arrays you can expand them out into different variables:
[ %w[ a b ], %w[ c d ] ].each do |a, b|
puts 'a=%s b=%s' % [ a, b ]
end
This pattern is very useful when using Hash#each and you want to break out the key and value parts of the pair: each { |k,v| ... } is very common in Ruby code.
If your block takes more than one argument and the element being iterated is an array then it switches how the arguments are interpreted. You can always force-expand:
[ %w[ a b ], %w[ c d ] ].each do |(a, b)|
puts 'a=%s b=%s' % [ a, b ]
end
That's useful for cases where things are more complex:
[ %w[ a b ], %w[ c d ] ].each_with_index do |(a, b), i|
puts 'a=%s b=%s # %d' % [ a, b, i ]
end
Since in this case it's iterating over an array and another element that's tacked on, so each item is actually a tuple of the form %w[ a b ], 0 internally, which will be converted to an array if your block only accepts one argument.
This is much the same principle you can use when defining variables:
a, b = %w[ a b ]
a
# => 'a'
b
# => 'b'
That actually assigns independent values to a and b. Contrast with:
a, b = [ %w[ a b ] ]
a
# => [ 'a', 'b' ]
b
# => nil
I would have expected the last two examples to give some sort of error.
It does in fact work that way if you pass a proc from a method. Yielding to such a proc is much stricter – it checks its arity and doesn't attempt to convert an array argument to an argument list:
def m(a, b)
"#{a}-#{b}"
end
['a', 'b', 'c'].zip([0, 1, 2]).map(&method(:m))
#=> wrong number of arguments (given 1, expected 2) (ArgumentError)
This is because zip creates an array (of arrays) and map just yields each element, i.e.
yield ['a', 0]
yield ['b', 1]
yield ['c', 2]
each_with_index on the other hand works:
['a', 'b', 'c'].each_with_index.map(&method(:m))
#=> ["a-0", "b-1", "c-2"]
because it yields two separate values, the element and its index, i.e.
yield 'a', 0
yield 'b', 1
yield 'c', 2
The code:
a = [1, 2, 3]
h = {a: 1}
def f args
p args
end
h.map(&method(:f))
a.map(&method(:f))
h.map do |k,v|
p [k,v]
end
The output:
[:a, 1]
1
2
3
[:a, 1]
Why can't I define f for a hash as follows?
def f k, v
p [k, v]
end
You are correct that the reason stems from the one of the two main differences between proc's and lambda's. I'll trying explaining it in a slightly different way than you did.
Consider:
a = [:a, 1]
h = {a: 1}
def f(k,v)
p [k, v]
end
a.each(&method(:f))
#-> in `f': wrong number of arguments (1 for 2) (ArgumentError)
h.each(&method(:f))
#-> in `f': wrong number of arguments (1 for 2) (ArgumentError)
where I use #-> to show what is printed and #=> to show what is returned. You used map, but each is more appropriate here, and makes the same point.
In both cases elements of the receiver are being passed to the block1:
&method(:f)
which is (more-or-less, as I will explain) equivalent to:
{ |k,v| p [k,v] }
The block is complaining (for both the array and hash) that it is expecting two arguments but receiving only one, and that is not acceptable. "Hmmm", the reader is thinking, "why doesn't it disambiguate in the normal way?"
Let's try using the block directly:
a.map { |k,v| p [k,v] }
#-> [:a, nil]
# [1, nil]
h.map { |k,v| p [k,v] }
#-> [:a, 1]
This works as expected, but does not return what we wanted for the array.
The first element of a (:a) is passed into the block and the block variables are assigned:
k,v = :a
#=> :a
k #=> :a
v #=> nil
and
p [k,v]
#-> :a
#-> nil
Next, 1 is passed to the block and [1,nil] is printed.
Let's try one more thing, using a proc created with Proc::new:
fp = Proc.new { |k,v| p [k, v] }
#=> #<Proc:0x007ffd6a0a8b00#(irb):34>
fp.lambda?
#=> false
a.each { |e| fp.call(e) }
#-> [:a, nil]
#-> [:a, 1]
h.each { |e| fp[e] }
#-> [:a, 1]
(Here I've used one of three aliases for Proc#call.) We see that calling the proc has the same result as using a block. The proc expects two arguments and but receives only one, but, unlike the lambda, does not complain2.
This tells us that we need to make small changes to a and f:
a = [[:a, 1]]
h = {a: 1}
def f(*(k,v))
p [k, v]
end
a.each(&method(:f))
#-> [:a, 1]
h.each(&method(:f))
#-> [:a, 1]
Incidentally, I think you may have fooled yourself with the variable name args:
def f args
p args
end
as the method has a single argument regardless of what you call it. :-)
1 The block is created by & calling Method#to_proc on the method f and then converting the proc (actually a lambda) to a block.
2 From the docs for Proc: "For procs created using lambda or ->() an error is generated if the wrong number of parameters are passed to a Proc with multiple parameters. For procs created using Proc.new or Kernel.proc, extra parameters are silently discarded."
As it appears, it must be some sort of implicit destructuring (or non-strict arguments handling), which works for procs, but doesn't for lambdas:
irb(main):007:0> Proc.new { |k,v| p [k,v] }.call([1,2])
[1, 2]
=> [1, 2]
irb(main):009:0> lambda { |k,v| p [k,v] }.call([1,2])
ArgumentError: wrong number of arguments (1 for 2)
from (irb):9:in `block in irb_binding'
from (irb):9:in `call'
from (irb):9
from /home/yuri/.rubies/ruby-2.1.5/bin/irb:11:in `<main>'
But one can make it work:
irb(main):010:0> lambda { |(k,v)| p [k,v] }.call([1,2])
[1, 2]
=> [1, 2]
And therefore:
def f ((k, v))
p [k, v]
end
So Hash#map always passes one argument.
UPD
This implicit destructuring also happens in block arguments.
names = ["Arthur", "Ford", "Trillian"]
ids = [42, 43, 44]
id_names = ids.zip(names) #=> [[42, "Arthur"], [43, "Ford"], [44, "Trillian"]]
id_names.each do |id, name|
puts "user #{id} is #{name}"
end
http://globaldev.co.uk/2013/09/ruby-tips-part-2/
UPD Don't take me wrong. I'm not suggesting writing such code (def f ((k, v))). In the question I was asking for explanation, not for the solution.
Block-local variables are to prevent a block from tampering with variables outside of its scope.
Using a block-local variable
x = 10
3.times do |y; x|
x = y
end
x # => 10
But this is easily done by declaring a regular block parameter. A new local scope is created for that parameter, which takes precedence over previous variables/scopes.
Without using a block-local variable
x = 10
3.times do |y, x|
x = y
end
x # => 10
The variable x outside the block doesn't get changed in either case. Is there any need for block-local variables other than for enhancing readability?
The block parameter is a real parameter, while a block local variable is not.
If you give yield two parameters like this:
def foo
yield("hello", "world")
end
Calling
x = 10
foo do |y; x|
puts x
end
x is nil inside the function because only the first argument is assigned to y, the second argument is discarded.
Calling
x = 10
foo do |y, x|
puts x
end
#=>world
x gets the parameter correctly as "world".
To expand on Yu Hao's answer, the difference between block parameters and block-local not obvious when calling a method that only yields one value, but consider a method that yields multiple values:
def frob
yield 1, 2, 3
end
If you pass this a block with a single argument, you get the first value:
frob { |a| a.inspect }
# => "1"
But if you pass a block with multiple arguments, you get multiple values, even if you pass too few arguments, or too many:
frob { |a, b, c| [a, b, c].inspect }
# => "[1, 2, 3]"
frob { |a, b| [a, b].inspect }
# => "[1, 2]"
frob { |a, b, c, d| [a, b, c, d].inspect }
# => "[1, 2, 3, nil]"
If you pass block-scoped variables, however, those are independent of the yielded value(s):
frob { |a; b, c| [a, b, c].inspect }
# => "[1, nil, nil]"
Something similar happens with methods that yield an array, except that when you pass a block with a single argument, it gets the whole array:
def frobble
yield [1, 2, 3]
end
frobble {|a| a.inspect }
# => "[1, 2, 3]"
Multiple arguments, however, destructure the array --
frobble {|a, b| [a, b].inspect }
# => "[1, 2]"
-- while a block-scoped variable doesn't:
frobble {|a; b| [a, b].inspect }
# => "[[1, 2, 3], nil]"
(Even with a block-scoped variable present, though, multiple values will still destructure the array: frobble {|a, b; c| [a, b, c].inspect } will get you "[1, 2, nil]".)
For more discussion and examples, see also this answer.
In the following code, if I don't add parentheses around the key, value parameters in "original.inject" (i.e., I do original.inject({}){ |result, key, value| ), I get a nil error as in the code comments below, as if the value is not being passed correctly. Why is this? What exactly is going on here? (I'm running ruby-2.1.1)
def hash_test(original,options={},&block)
original.inject({}){ |result, (key, value)|
value = value + 2
block.call(result, key, value)
result
}
end
h={:a=>3, :b=>4}
r = hash_test(h) { |result, key, value| result[key]=value }
puts r #=> {:a=>5, :b=>6}
#if no parentheses around (key, value) in original.inject, you get a:
# hash_transformer.rb:5:in `block in hash_test': undefined method `+' for nil:NilClass (NoMethodError)
# from hash_transformer.rb:4:in `each'
# from hash_transformer.rb:4:in `inject'
# from hash_transformer.rb:4:in `hash_test'
# from hash_transformer.rb:15:in `<main>'
When inject is called on a Hash inject yields the key value pairs as an array to the supplied block, example as below:
{:a=>3, :b=>4}.inject({}) do |x,y|
p y
x[y[0]] = y[1]
x
end
#>>[:a, 3]
#>>[:b, 4]
#=> {:a=>3, :b=>4}
y yielded the first time is [:a, 3], and second time is [:b, 4], so you need to destructure the array by supplying the parenthesis around the arguments:
(a,b) = [:a, 3]
a #=> :a
b #=> 3
This shows what's going on.
With deconstruction of hash element
{:a=>3, :b=>4}.inject({}) do |x,(y,z)|
puts "x = #{x}"
puts "y = #{y}"
puts "z = #{z}"
puts
x
end
x = {}
y = a
z = 3
x = {}
y = b
z = 4
Without deconstruction of hash element
{:a=>3, :b=>4}.inject({}) do |x,y,z|
puts "x = #{x}"
puts "y = #{y}"
puts "z.nil? = #{z.nil?}"
puts
x
end
x = {}
y = [:a, 3]
z.nil? = true
x = {}
y = [:b, 4]
z.nil? = true
Take a look at this post :
http://apidock.com/ruby/Enumerable/inject
It said inject is defined to take pairs parameter.
If you want to use three parameters, like (inject do |c,key,val|), then you should use parentheses to combine the (key,val)