Why do you have to specify 2 arguments explicitly to curry :> - ruby

Consider this, which works fine:
:>.to_proc.curry(2)[9][8] #=> true, because 9 > 8
However, even though > is a binary operator, the above won't work without the arity specified:
:>.to_proc.curry[9][8] #=> ArgumentError: wrong number of arguments (0 for 1)
Why aren't the two equivalent?
Note: I specifically want to create the intermediate curried function with one arg supplied, and then call then call that with the 2nd arg.

curry has to know the arity of the proc passed in, right?
:<.to_proc.arity # => -1
Negative values from arity are confusing, but basically mean 'variable number of arguments' one way or another.
Compare to:
less_than = lambda {|a, b| a < b}
less_than.arity # => 2
When you create a lambda saying it takes two arguments, it knows it takes two arguments, and will work fine with that style of calling #curry.
less_than.curry[9][8] # => false, no problem!
But when you use the symbol #to_proc trick, it's just got a symbol to go on, it has no idea how many arguments it takes. While I don't think < is actually an ordinary method in ruby, I think you're right it neccessarily takes two args, the Symbol#to_proc thing is a general purpose method that works on any method name, it has no idea how many args the method should take, so defines the proc with variable arguments.
I don't read C well enough to follow the MRI implementation, but I assume Symbol#to_proc defines a proc with variable arguments. The more typical use of Symbol#to_proc, of course, is for a no-argument methods. You can for instance do this with it if you want:
hello_proc = :hello.to_proc
class SomeClass
def hello(name = nil)
puts "Hello, #{name}!"
end
end
obj = SomeClass.new
obj.hello #=> "Hello, !"
obj.hello("jrochkind") #=> "Hello, jrochkind!"
obj.hello("jrochkind", "another")
# => ArgumentError: wrong number of arguments calling `hello` (2 for 1)
hello_proc.call(obj) # => "Hello, !"
hello_proc.call(obj, "jrochkind") # => "Hello, jrochkind!"
hello_proc.call(obj, "jrochkind", "another")
# => ArgumentError: wrong number of arguments calling `hello` (2 for 1)
hello_proc.call("Some string")
# => NoMethodError: undefined method `hello' for "Some string":String
Note I did hello_proc = :hello.to_proc before I even defined SomeClass. The Symbol#to_proc mechanism creates a variable arity proc, that knows nothing about how or where or on what class it will be called, it creates a proc that can be called on any class at all, and can be used with any number of arguments.
If it were defined in ruby instead of C, it would look something like this:
class Symbol
def to_proc
method_name = self
proc {|receiver, *other_args| receiver.send(method_name, *other_args) }
end
end

I think it is because Symbol#to_proc creates a proc with one argument. When turned into a proc, :> does not look like:
->x, y{...}
but it looks like:
->x{...}
with the requirement of the original single argument of > somehow tucked inside the proc body (notice that > is not a method that takes two arguments, it is a method called on one receiver with one argument). In fact,
:>.to_proc.arity # => -1
->x, y{}.arity # => 2
which means that applying curry to it without argument would only have a trivial effect; it takes a proc with one parameter, and returns itself. By explicitly specifying 2, it does something non-trivial. For comparison, consider join:
:join.to_proc.arity # => -1
:join.to_proc.call(["x", "y"]) # => "xy"
:join.to_proc.curry.call(["x", "y"]) # => "xy"
Notice that providing a single argument after Currying :join already evaluates the whole method.

#jrochkind's answer does a great job of explaining why :>.to_proc.curry doesn't have the behavior you want. I wanted to mention, though, that there's a solution to this part of your question:
I specifically want to create the intermediate curried function with one arg supplied, and then call then call that with the 2nd arg.
The solution is Object#method. Instead of this:
nine_is_greater_than = :>.to_proc.curry[9]
nine_is_greater_than[8]
#=> ArgumentError: wrong number of arguments (0 for 1)
...do this:
nine_is_greater_than = 9.method(:>)
nine_is_greater_than[8]
# => true
Object#method returns a Method object, which acts just like a Proc: it responds to call, [], and even (as of Ruby 2.2) curry. However, if you need a real proc (or want to use curry with Ruby < 2.2) you can also call to_proc on it (or use &, the to_proc operator):
[ 1, 4, 8, 10, 20, 30 ].map(&nine_is_greater_than)
# => [ true, true, true, false, false, false ]

Related

why pass block arguments to a function in ruby?

I'm unclear on why there is a need to pass block arguments when calling a function.
why not just pass in as function arguments and what happens to the block arguments, how are they passed and used?
m.call(somevalue) {|_k, v| v['abc'] = 'xyz'}
module m
def call ( arg1, *arg2, &arg3)
end
end
Ruby, like almost all mainstream programming languages, is a strict language, meaning that arguments are fully evaluated before being passed into the method.
Now, imagine you want to implement (a simplified version of) Integer#times. The implementation would look a little bit like this:
class Integer
def my_times(action_to_be_executed)
raise ArgumentError, "`self` must be non-negative but is `#{inspect}`" if negative?
return if zero?
action_to_be_executed
pred.my_times(action_to_be_executed)
end
end
3.my_times(puts "Hello")
# Hello
0.my_times(puts "Hello")
# Hello
-1.my_times(puts "Hello")
# Hello
# ArgumentError (`self` must be non-negative but is `-1`)
As you can see, 3.my_times(puts "Hello") printed Hello exactly once, instead of thrice, as it should do. Also, 0.my_times(puts "Hello") printed Hello exactly once, instead of not at all, as it should do, despite the fact that it returns in the second line of the method, and thus action_to_be_executed is never even evaluated. Even -1.my_times(puts "Hello") printed Hello exactly once, despite that fact that it raises an ArgumentError exception as the very first thing in the method and thus the entire rest of the method body is never evaluated.
Why is that? Because Ruby is strict! Again, strict means that arguments are fully evaluated before being passed. So, what this means is that before my_times even gets called, the puts "Hello" is evaluated (which prints Hello to the standard output stream), and the result of that evaluation (which is just nil because Kernel#puts always returns nil) is passed into the method.
So, what we need to do, is somehow delay the evaluation of the argument. One way we know how to delay evaluation, is by using a method: methods are only evaluated when they are called.
So, we take a page out of Java's playbook, and define a Single Abstract Method Protocol: the argument that is being passed to my_each must be an object which implements a method with a specific name. Let's call it call, because, well, we are going to call it.
This would look a little bit like this:
class Integer
def my_times(action_to_be_executed)
raise ArgumentError, "`self` must be non-negative but is `#{inspect}`" if negative?
return if zero?
action_to_be_executed.call
pred.my_times(action_to_be_executed)
end
end
def (hello = Object.new).call
puts "Hello"
end
3.my_times(hello)
# Hello
# Hello
# Hello
0.my_times(hello)
-1.my_times(hello)
# ArgumentError (`self` must be non-negative but is `-1`)
Nice! It works! The argument that is passed is of course still strictly evaluated before being passed (we can't change the fundamental nature of Ruby from within Ruby itself), but this evaluation only results in the object that is bound by the local variable hello. The code that we want to run is another layer of indirection away and will only be executed at the point where we actually call it.
It also has another advantage: Integer#times actually makes the index of the current iteration available to the action as an argument. This was impossible to implement with our first solution, but here we can do it, because we are using a method and methods can take arguments:
class Integer
def my_times(action_to_be_executed)
raise ArgumentError, "`self` must be non-negative but is `#{inspect}`" if negative?
__my_times_helper(action_to_be_executed)
end
protected
def __my_times_helper(action_to_be_executed, index = 0)
return if zero?
action_to_be_executed.call(index)
pred.__my_times_helper(action_to_be_executed, index + 1)
end
end
def (hello = Object.new).call(i)
puts "Hello from iteration #{i}"
end
3.my_times(hello)
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
0.my_times(hello)
-1.my_times(hello)
# ArgumentError (`self` must be non-negative but is `-1`)
However, this is not actually very readable. If you didn't want to give a name to this action that we are trying to pass but instead simply literally write it down inside the argument list, it would look something like this:
3.my_times(Object.new.tap do |obj|
def obj.call(i)
puts "Hello from iteration #{i}"
end
end)
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
or on one line:
3.my_times(Object.new.tap do |obj| def obj.call; puts "Hello from iteration #{i}" end end)
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
# or:
3.my_times(Object.new.tap {|obj| def obj.call; puts "Hello from iteration #{i}" end })
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
Now, I don't know about you, but I find that pretty ugly.
In Ruby 1.9, Ruby added Proc literals aka stabby lambda literals to the language. Lambda literals are a concise literal syntax for writing objects with a call method, specifically Proc objects with Proc#call.
Using lambda literals, and without any changes to our existing code, it looks something like this:
3.my_times(-> i { puts "Hello from iteration #{i}" })
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
This does not look bad!
When Yukihiro "matz" Matsumoto designed Ruby almost thirty years ago in early 1993, he did a survey of the core libraries and standard libraries of languages like Smalltalk, Scheme, and Common Lisp to figure out how such methods that take a piece of code as an argument are actually used, and he found that the overwhelming majority of such methods take exactly one code argument and all they do with that argument is call it.
So, he decided to add special language support for a single argument that contains code and can only be called. This argument is both syntactically and semantically lightweight, in particular, it looks syntactically exactly like any other control structure, and it is semantically not an object.
This special language feature, you probably guessed it, are blocks.
Every method in Ruby has an optional block parameter. I can always pass a block to a method. It's up to the method to do anything with the block. Here, for example, the block is useless because Kernel#puts doesn't do anything with a block:
puts("Hello") { puts "from the block" }
# Hello
Because blocks are not objects, you cannot call methods on them. Also, because there can be only one block argument, there is no need to give it a name: if you refer to a block, it's always clear which block because there can be only one. But, if the block doesn't have methods and doesn't have a name, how can we call it?
That's what the yield keyword is for. It temporarily "yields" control flow to the block, or, in other words, it calls the block.
With blocks, our solution would look like this:
class Integer
def my_times(&action_to_be_executed)
raise ArgumentError, "`self` must be non-negative but is `#{inspect}`" if negative?
return enum_for(__callee__) unless block_given?
__my_times_helper(&action_to_be_executed)
end
protected
def __my_times_helper(&action_to_be_executed, index = 0)
return if zero?
yield index
pred.__my_times_helper(&action_to_be_executed, index + 1)
end
end
3.my_times do
puts "Hello from iteration #{i}"
end
# Hello from iteration 0
# Hello from iteration 1
# Hello from iteration 2
0.my_times do
puts "Hello from iteration #{i}"
end
-1.my_times do
puts "Hello from iteration #{i}"
end
# ArgumentError (`self` must be non-negative but is `-1`)
Okay, you might notice that I simplified a bit when I wrote above that the only thing you can do with a block is call it. There are two other things you can do with it:
You can check whether a block argument was passed using Kernel#block_given?. Since blocks are always optional, and blocks have no names, there must be a way to check whether a block was passed or not.
You can "roll up" a block (which is not an object and doesn't have a name) into a Proc object (which is an object) and bind it to a parameter (which gives it a name) using the & ampersand unary prefix sigil in the parameter list of the method. Now that we have an object, and a way to refer to it, we can store it in a variable, return it from a method, or (as we are doing here) pass it along as an argument to a different method, which otherwise wouldn't be possible.
There is also the opposite operation: with the & ampersand unary prefix operator, you can "unroll" a Proc object into a block in an argument list; this makes it so that the method behaves as if you had passed the code that is stored inside the Proc as a literal block argument to the method.
And there you have it! That's what blocks are for: a semantically and syntactically lightweight form of passing code to a method.
There are other possible approaches, of course. The approach that is closest to Ruby is probably Smalltalk. Smalltalk also has a concept called blocks (in fact, that is where Ruby got both the idea and the name from). Similarly to Ruby, Smalltalk blocks have a syntactically light-weight literal form, but they are objects, and you can pass more than one to a method. Thanks to Smalltalk's generally light-weight and simple syntax, especially the keyword method syntax which intersperses parts of the method name with the arguments, even passing multiple blocks to a method call is very concise and readable.
For example, Smalltalk actually does not have an if / then / else conditional expression, in fact, Smalltalk has no control structures at all. Everything is done with methods. So, the way that a conditional works, is that the two boolean classes TrueClass and FalseClass each have a method named ifTrue:ifFalse: which takes two block arguments, and the two implementations will simply either evaluate the first or the second block. For example, the implementation in TrueClass might look a little bit like this (note that Smalltalk has no syntax for classes or methods, instead classes and methods are created in the IDE by creating class objects and method objects via the GUI):
True>>ifTrue: trueBlock ifFalse: falseBlock
"Answer with the value of `trueBlock`."
↑trueBlock value
The corresponding implementation in FalseClass would then look like this:
FalseClass>>ifTrue: trueBlock ifFalse: falseBlock
"Answer with the value of `falseBlock`."
↑falseBlock value
And you would call it like this:
2 < 3 ifTrue: [ Transcript show: 'yes' ] ifFalse: [ Transcript show: 'no' ].
"yes"
4 < 3 ifTrue: [ Transcript show: 'yes' ] ifFalse: [ Transcript show: 'no' ].
"no"
In ECMAScript, you can simply use function definitions as expressions, and there is also lightweight syntax for functions.
In the various Lisps, code is just data, and data is code, so you can just pass the code as an argument as data, then inside the function, treat that data as code again.
Scala has call-by-name parameters which are only evaluated when you use their name, and they are evaluated every time you use their name. It would look something like this:
implicit class IntegerTimes(val i: Int) extends AnyVal {
#scala.annotation.tailrec
def times(actionToBeExecuted: => Unit): Unit = {
if (i < 0) throw new Error()
if (i == 0) () else { actionToBeExecuted; (i - 1).times(actionToBeExecuted) }
}
}
3.times { println("Hello") }
// Hello
// Hello
// Hello

Why is `to_ary` called from a double-splatted parameter in a code block?

It seems that a double-splatted block parameter calls to_ary on an object that is passed, which does not happen with lambda parameters and method parameters. This was confirmed as follows.
First, I prepared an object obj on which a method to_ary is defined, which returns something other than an array (i.e., a string).
obj = Object.new
def obj.to_ary; "baz" end
Then, I passed this obj to various constructions that have a double splatted parameter:
instance_exec(obj){|**foo|}
# >> TypeError: can't convert Object to Array (Object#to_ary gives String)
->(**foo){}.call(obj)
# >> ArgumentError: wrong number of arguments (given 1, expected 0)
def bar(**foo); end; bar(obj)
# >> ArgumentError: wrong number of arguments (given 1, expected 0)
As can be observed above, only code block tries to convert obj to an array by calling a (potential) to_ary method.
Why does a double-splatted parameter for a code block behave differently from those for a lambda expression or a method definition?
I don't have full answers to your questions, but I'll share what I've found out.
Short version
Procs allow to be called with number of arguments different than defined in the signature. If the argument list doesn't match the definition, #to_ary is called to make implicit conversion. Lambdas and methods require number of args matching their signature. No conversions are performed and that's why #to_ary is not called.
Long version
What you describe is a difference between handling params by lambdas (and methods) and procs (and blocks). Take a look at this example:
obj = Object.new
def obj.to_ary; "baz" end
lambda{|**foo| print foo}.call(obj)
# >> ArgumentError: wrong number of arguments (given 1, expected 0)
proc{|**foo| print foo}.call(obj)
# >> TypeError: can't convert Object to Array (Object#to_ary gives String)
Proc doesn't require the same number of args as it defines, and #to_ary is called (as you probably know):
For procs created using lambda or ->(), an error is generated if wrong number of parameters are passed to the proc. For procs created using Proc.new or Kernel.proc, extra parameters are silently discarded and missing parameters are set to nil. (Docs)
What is more, Proc adjusts passed arguments to fit the signature:
proc{|head, *tail| print head; print tail}.call([1,2,3])
# >> 1[2, 3]=> nil
Sources: makandra, SO question.
#to_ary is used for this adjustment (and it's reasonable, as #to_ary is for implicit conversions):
obj2 = Class.new{def to_ary; [1,2,3]; end}.new
proc{|head, *tail| print head; print tail}.call(obj2)
# >> 1[2, 3]=> nil
It's described in detail in a ruby tracker.
You can see that [1,2,3] was split to head=1 and tail=[2,3]. It's the same behaviour as in multi assignment:
head, *tail = [1, 2, 3]
# => [1, 2, 3]
tail
# => [2, 3]
As you have noticed, #to_ary is also called when when a proc has double-splatted keyword args:
proc{|head, **tail| print head; print tail}.call(obj2)
# >> 1{}=> nil
proc{|**tail| print tail}.call(obj2)
# >> {}=> nil
In the first case, an array of [1, 2, 3] returned by obj2.to_ary was split to head=1 and empty tail, as **tail wasn't able to match an array of[2, 3].
Lambdas and methods don't have this behaviour. They require strict number of params. There is no implicit conversion, so #to_ary is not called.
I think that this difference is implemented in these two lines of the Ruby soruce:
opt_pc = vm_yield_setup_args(ec, iseq, argc, sp, passed_block_handler,
(is_lambda ? arg_setup_method : arg_setup_block));
and in this function. I guess #to_ary is called somewhere in vm_callee_setup_block_arg_arg0_splat, most probably in RARRAY_AREF. I would love to read a commentary of this code to understand what happens inside.

Why does this block not run when it is stored in a proc?

I'm learning ruby and trying to get a better understanding of Blocks, Yield, Procs and Methods and I stumbled upon this example on using yield.
def calculation(a, b)
yield(a, b)
end
x = calculation(5,6) do|a,b|
a + b
end
puts "#{x}"
From what I understand Procs are object that holds a pointer to Blocks. And Blocks need a method to work in the first place. Also, from the way yield is used, I assume yield jumps to the block immediately after the method call.
I assume the code runs this way: calculation(5,6) calls the method calculation(). when the yield instruction executes, a and b are passed to the block after calculation(5,6). To experement and get a better understand I tried doing this.
def calculation(a, b)
yield(a, b)
end
ankh = Proc.new do |a,b|
a + b
end
x = calculation(5,6) *ankh
The error says that no block is given to calculation(). But aren't we giving calculation(5,6) the block ankh? Hopefully my question isn't too confusing.
You have a syntax error in the line x = calculation(5,6) *ankh. To pass a method as a block, you use the &-operator.
x = calculation(5,6,&ankh)
First off: what you wrote doesn't make any sense. Think about it: what does
calculation(5, 6) * ankh
mean? Or, more abstractly, what does
foo * bar
mean? Does 2 * 3 really mean "call 2 and pass 3 as a block"?
The error says that no block is given to calculation(). But aren't we giving calculation(5,6) the block ankh?
No, ankh is not a block, it's a Proc. A block is a purely syntactic construct. Most importantly, a block is not an object, so you simply cannot store it in a variable at all. You also cannot pass it as a normal argument to a method, you have to pass it as a separate "special" block argument. Blocks do not exist independent from method calls.
There is, however, a way of "converting" a Proc into a block: the & ampersand unary prefix operator:
x = calculation(5, 6, &ankh)
# => 11
This tells Ruby to take the Proc ankh and turn it into a block. In fact, this mechanism is much more general than that, because you can even pass an object which is not a Proc and Ruby will first call to_proc on that object to allow it to convert itself to a Proc.
For example, Method implements to_proc, so you can pass Methods as blocks:
def ankh(a, b) a + b end
x = calculation(5, 6, &method(:ankh))
# => 11
Also, Symbol implements to_proc:
x = calculation(5, 6, &:+)
# => 11
Lastly, Hash implements to_proc as well.
And, of course, you can write your own objects that implement to_proc:
def (ankh = Object.new).to_proc
-> *args { "I was called with arguments #{args.inspect}!" }
end
x = calculation(5, 6, &ankh)
# => 'I was called with arguments [5, 6]!'

Possible to instance_eval a curried proc?

Suppose I have a class such as this:
class Test
def test_func
140
end
end
And a proc, which references a member function from Test:
p = ->(x, y) { x + y + test_func } # => #<Proc:0x007fb3143e7f78#(pry):6 (lambda)>
To call p, I bind it to an instance of Test:
test = Test.new # => #<Test:0x007fb3143c5a68>
test.instance_exec(1, 2, &p) # => 143
Now suppose I want to pass just y to p, and always pass x = 1:
curried = p.curry[1] # => #<Proc:0x007fb3142be070 (lambda)>
Ideally I should be able to just instance_exec as before, but instead:
test.instance_exec(2, &curried)
=> NameError: undefined local variable or method `test_func' for main:Object
The proc runs in what seems to be the incorrect binding. What gives?
Yes, I believe this is a bug.
I think it comes down to the fact that curry returns a "C level proc" rather than a normal proc. I don't fully understand the difference between the two (I'm guessing the former is one created by the Ruby C code which is what curry does), but you can tell they're different when you try and take a binding.
p.binding # => #<Binding:0x000000020b4238>
curried.binding # => ArgumentError: Can't create a binding from C level Proc
By looking at the source, this looks like their internal struct representations have different values for the iseq member, which says what kind of instruction sequence this block holds.
This is significant when you call instance_exec, which eventually ends up calling invoke_block_from_c in vm.c, which branches depending on the iseq type:
else if (BUILTIN_TYPE(block->iseq) != T_NODE) {
...
} else {
return vm_yield_with_cfunc(th, block, self, argc, argv, blockptr);
}
The branch I missed out (...) ends up calling vm_push_frame with what looks like some environment where as vm_yield_with_cfunc doesn't.
So my guess would be that because the curried proc is created in C code and ends up of a different 'type' than your first proc, the other branch is taken in the above snippet and the enviornment isn't used.
I should point out that all of this is pretty speculative based on reading the code, I haven't run any tests or tried anything out (and I'm also not all that familiar with internal Ruby anyway!)

Passing a hash to a function ( *args ) and its meaning

When using an idiom such as:
def func(*args)
# some code
end
What is the meaning of *args? Googling this specific question was pretty hard, and I couldn't find anything.
It seems all the arguments actually appear in args[0] so I find myself writing defensive code such as:
my_var = args[0].delete(:var_name) if args[0]
But I'm sure there's a better way I'm missing out on.
The * is the splat (or asterisk) operator. In the context of a method, it specifies a variable length argument list. In your case, all arguments passed to func will be putting into an array called args. You could also specify specific arguments before a variable-length argument like so:
def func2(arg1, arg2, *other_args)
# ...
end
Let's say we call this method:
func2(1, 2, 3, 4, 5)
If you inspect arg1, arg2 and other_args within func2 now, you will get the following results:
def func2(arg1, arg2, *other_args)
p arg1.inspect # => 1
p arg2.inspect # => 2
p other_args.inspect # => [3, 4, 5]
end
In your case, you seem to be passing a hash as an argument to your func, in which case, args[0] will contain the hash, as you are observing.
Resources:
Variable Length Argument List, Asterisk Operator
What is the * operator doing
Update based on OP's comments
If you want to pass a Hash as an argument, you should not use the splat operator. Ruby lets you omit brackets, including those that specify a Hash (with a caveat, keep reading), in your method calls. Therefore:
my_func arg1, arg2, :html_arg => value, :html_arg2 => value2
is equivalent to
my_func(arg1, arg2, {:html_arg => value, :html_arg2 => value2})
When Ruby sees the => operator in your argument list, it knows to take the argument as a Hash, even without the explicit {...} notation (note that this only applies if the hash argument is the last one!).
If you want to collect this hash, you don't have to do anything special (though you probably will want to specify an empty hash as the default value in your method definition):
def my_func(arg1, arg2, html_args = {})
# ...
end

Resources