Possible to instance_eval a curried proc? - ruby

Suppose I have a class such as this:
class Test
def test_func
140
end
end
And a proc, which references a member function from Test:
p = ->(x, y) { x + y + test_func } # => #<Proc:0x007fb3143e7f78#(pry):6 (lambda)>
To call p, I bind it to an instance of Test:
test = Test.new # => #<Test:0x007fb3143c5a68>
test.instance_exec(1, 2, &p) # => 143
Now suppose I want to pass just y to p, and always pass x = 1:
curried = p.curry[1] # => #<Proc:0x007fb3142be070 (lambda)>
Ideally I should be able to just instance_exec as before, but instead:
test.instance_exec(2, &curried)
=> NameError: undefined local variable or method `test_func' for main:Object
The proc runs in what seems to be the incorrect binding. What gives?

Yes, I believe this is a bug.
I think it comes down to the fact that curry returns a "C level proc" rather than a normal proc. I don't fully understand the difference between the two (I'm guessing the former is one created by the Ruby C code which is what curry does), but you can tell they're different when you try and take a binding.
p.binding # => #<Binding:0x000000020b4238>
curried.binding # => ArgumentError: Can't create a binding from C level Proc
By looking at the source, this looks like their internal struct representations have different values for the iseq member, which says what kind of instruction sequence this block holds.
This is significant when you call instance_exec, which eventually ends up calling invoke_block_from_c in vm.c, which branches depending on the iseq type:
else if (BUILTIN_TYPE(block->iseq) != T_NODE) {
...
} else {
return vm_yield_with_cfunc(th, block, self, argc, argv, blockptr);
}
The branch I missed out (...) ends up calling vm_push_frame with what looks like some environment where as vm_yield_with_cfunc doesn't.
So my guess would be that because the curried proc is created in C code and ends up of a different 'type' than your first proc, the other branch is taken in the above snippet and the enviornment isn't used.
I should point out that all of this is pretty speculative based on reading the code, I haven't run any tests or tried anything out (and I'm also not all that familiar with internal Ruby anyway!)

Related

What is the purpose of the 'proc' parameter in Marshal::load?

I have looked for resources which explain its purpose. I couldn't find any real world implementations either.
Below is the extract from Ruby's documentation:
load( source [, proc] ) → obj Returns the result of converting the
serialized data in source into a Ruby object (possibly with associated
subordinate objects). source may be either an instance of IO or an
object that responds to to_str. If proc is specified, each object will
be passed to the proc, as the object is being deserialized.
I would appreciate an example of its usage, or at least direct me to some resources.
You can see how the proc is invoked by doing something like:
irb(main):030:0> Marshal.load(Marshal.dump(a:1), lambda { |x| p [self,x]; x })
[main, :a]
[main, 1]
[main, {:a=>1}]
=> {:a=>1}
When used with a marshalled string, the proc is for some reason invoked twice.
irb(main):031:0> Marshal.load(Marshal.dump('a'), lambda { |x| p [self,x]; x })
[main, "a"]
[main, true]
=> "a"
Transformation of Deserialized Objects
A general use case is that you want to perform some action or transformation on the object you're deserializing. For example, using some Ruby 2.7.1 shortcuts:
Marshal.load Marshal.dump("abc"), ->{ _1.to_s.upcase }
#=> "ABC"
This doesn't add much value when deserializing a single object, but could be very useful if you're handling dumped objects in bulk. I can't think of a pragmatic use case where you couldn't transform after deserializing, but Ruby is full of useful tools for handling things without intermediate steps. This seems to be one of them.
Possible Bug: Procs Appear to Run Twice but Return Once
In the example above, I coerce the first positional argument to a string because otherwise I get a NoMethodError on one of the two passes this is making through the lambda. You can sort of unpack what's going on (but perhaps not why) as follows:
prc = proc { |obj| pp obj }
Marshal.load Marshal.dump("abc"), prc
"abc"
true
#=> "abc"
For whatever reason, the body of a Proc or lambda is called twice, but only returns the first pass. The problem occurs when the second invocation raises a NoMethodError exception when called on TrueClass, so the call never returns a value.
Another way to handle this is to explicitly handle the exception, e.g.:
prc = proc { |obj| obj.upcase rescue NoMethodError }
or to avoid invoking methods on true:
prc = proc { |obj| obj.upcase unless obj == true }
While I can explain what is going on, and how to work around it, I can't tell you why invoking with a proc-like object behaves this way. That's a question for the Ruby Core Team, or fodder for the Ruby bug tracker.

Why does this block not run when it is stored in a proc?

I'm learning ruby and trying to get a better understanding of Blocks, Yield, Procs and Methods and I stumbled upon this example on using yield.
def calculation(a, b)
yield(a, b)
end
x = calculation(5,6) do|a,b|
a + b
end
puts "#{x}"
From what I understand Procs are object that holds a pointer to Blocks. And Blocks need a method to work in the first place. Also, from the way yield is used, I assume yield jumps to the block immediately after the method call.
I assume the code runs this way: calculation(5,6) calls the method calculation(). when the yield instruction executes, a and b are passed to the block after calculation(5,6). To experement and get a better understand I tried doing this.
def calculation(a, b)
yield(a, b)
end
ankh = Proc.new do |a,b|
a + b
end
x = calculation(5,6) *ankh
The error says that no block is given to calculation(). But aren't we giving calculation(5,6) the block ankh? Hopefully my question isn't too confusing.
You have a syntax error in the line x = calculation(5,6) *ankh. To pass a method as a block, you use the &-operator.
x = calculation(5,6,&ankh)
First off: what you wrote doesn't make any sense. Think about it: what does
calculation(5, 6) * ankh
mean? Or, more abstractly, what does
foo * bar
mean? Does 2 * 3 really mean "call 2 and pass 3 as a block"?
The error says that no block is given to calculation(). But aren't we giving calculation(5,6) the block ankh?
No, ankh is not a block, it's a Proc. A block is a purely syntactic construct. Most importantly, a block is not an object, so you simply cannot store it in a variable at all. You also cannot pass it as a normal argument to a method, you have to pass it as a separate "special" block argument. Blocks do not exist independent from method calls.
There is, however, a way of "converting" a Proc into a block: the & ampersand unary prefix operator:
x = calculation(5, 6, &ankh)
# => 11
This tells Ruby to take the Proc ankh and turn it into a block. In fact, this mechanism is much more general than that, because you can even pass an object which is not a Proc and Ruby will first call to_proc on that object to allow it to convert itself to a Proc.
For example, Method implements to_proc, so you can pass Methods as blocks:
def ankh(a, b) a + b end
x = calculation(5, 6, &method(:ankh))
# => 11
Also, Symbol implements to_proc:
x = calculation(5, 6, &:+)
# => 11
Lastly, Hash implements to_proc as well.
And, of course, you can write your own objects that implement to_proc:
def (ankh = Object.new).to_proc
-> *args { "I was called with arguments #{args.inspect}!" }
end
x = calculation(5, 6, &ankh)
# => 'I was called with arguments [5, 6]!'

Why do you have to specify 2 arguments explicitly to curry :>

Consider this, which works fine:
:>.to_proc.curry(2)[9][8] #=> true, because 9 > 8
However, even though > is a binary operator, the above won't work without the arity specified:
:>.to_proc.curry[9][8] #=> ArgumentError: wrong number of arguments (0 for 1)
Why aren't the two equivalent?
Note: I specifically want to create the intermediate curried function with one arg supplied, and then call then call that with the 2nd arg.
curry has to know the arity of the proc passed in, right?
:<.to_proc.arity # => -1
Negative values from arity are confusing, but basically mean 'variable number of arguments' one way or another.
Compare to:
less_than = lambda {|a, b| a < b}
less_than.arity # => 2
When you create a lambda saying it takes two arguments, it knows it takes two arguments, and will work fine with that style of calling #curry.
less_than.curry[9][8] # => false, no problem!
But when you use the symbol #to_proc trick, it's just got a symbol to go on, it has no idea how many arguments it takes. While I don't think < is actually an ordinary method in ruby, I think you're right it neccessarily takes two args, the Symbol#to_proc thing is a general purpose method that works on any method name, it has no idea how many args the method should take, so defines the proc with variable arguments.
I don't read C well enough to follow the MRI implementation, but I assume Symbol#to_proc defines a proc with variable arguments. The more typical use of Symbol#to_proc, of course, is for a no-argument methods. You can for instance do this with it if you want:
hello_proc = :hello.to_proc
class SomeClass
def hello(name = nil)
puts "Hello, #{name}!"
end
end
obj = SomeClass.new
obj.hello #=> "Hello, !"
obj.hello("jrochkind") #=> "Hello, jrochkind!"
obj.hello("jrochkind", "another")
# => ArgumentError: wrong number of arguments calling `hello` (2 for 1)
hello_proc.call(obj) # => "Hello, !"
hello_proc.call(obj, "jrochkind") # => "Hello, jrochkind!"
hello_proc.call(obj, "jrochkind", "another")
# => ArgumentError: wrong number of arguments calling `hello` (2 for 1)
hello_proc.call("Some string")
# => NoMethodError: undefined method `hello' for "Some string":String
Note I did hello_proc = :hello.to_proc before I even defined SomeClass. The Symbol#to_proc mechanism creates a variable arity proc, that knows nothing about how or where or on what class it will be called, it creates a proc that can be called on any class at all, and can be used with any number of arguments.
If it were defined in ruby instead of C, it would look something like this:
class Symbol
def to_proc
method_name = self
proc {|receiver, *other_args| receiver.send(method_name, *other_args) }
end
end
I think it is because Symbol#to_proc creates a proc with one argument. When turned into a proc, :> does not look like:
->x, y{...}
but it looks like:
->x{...}
with the requirement of the original single argument of > somehow tucked inside the proc body (notice that > is not a method that takes two arguments, it is a method called on one receiver with one argument). In fact,
:>.to_proc.arity # => -1
->x, y{}.arity # => 2
which means that applying curry to it without argument would only have a trivial effect; it takes a proc with one parameter, and returns itself. By explicitly specifying 2, it does something non-trivial. For comparison, consider join:
:join.to_proc.arity # => -1
:join.to_proc.call(["x", "y"]) # => "xy"
:join.to_proc.curry.call(["x", "y"]) # => "xy"
Notice that providing a single argument after Currying :join already evaluates the whole method.
#jrochkind's answer does a great job of explaining why :>.to_proc.curry doesn't have the behavior you want. I wanted to mention, though, that there's a solution to this part of your question:
I specifically want to create the intermediate curried function with one arg supplied, and then call then call that with the 2nd arg.
The solution is Object#method. Instead of this:
nine_is_greater_than = :>.to_proc.curry[9]
nine_is_greater_than[8]
#=> ArgumentError: wrong number of arguments (0 for 1)
...do this:
nine_is_greater_than = 9.method(:>)
nine_is_greater_than[8]
# => true
Object#method returns a Method object, which acts just like a Proc: it responds to call, [], and even (as of Ruby 2.2) curry. However, if you need a real proc (or want to use curry with Ruby < 2.2) you can also call to_proc on it (or use &, the to_proc operator):
[ 1, 4, 8, 10, 20, 30 ].map(&nine_is_greater_than)
# => [ true, true, true, false, false, false ]

Ruby: tap writes on a read?

So if I understand correctly Object#tap uses yield to produce a temporary object to work with during the execution of a process or method. From what I think I know about yield, it does something like, yield takes (thing) and gives (thing).dup to the block attached to the method it's being used in?
But when I do this:
class Klass
attr_accessor :hash
def initialize
#hash={'key' => 'value'}
end
end
instance=Klass.new
instance.instance_variable_get('#hash')[key] # => 'value', as it should
instance.instance_variable_get('#hash').tap {|pipe| pipe['key']=newvalue}
instance.instance_variable_get('#hash')[key] # => new value... wut?
I was under the impression that yield -> new_obj. I don't know how correct this is though, I tried to look it up on ruby-doc, but Enumerator::yielder is empty, yield(proc) isn't there, and the fiber version... I don't have any fibers, in fact, doesn't Ruby actually explicitly require include 'fiber' to use them?
So what ought have been a read method on the instance variable and a write on the temp is instead a read/write on the instance variable... which is cool, because that's what I was trying to do and accidentally found when I was looking up a way to deal with hashes as instance variables (for some larger-than-I'm-used-to tables for named arrays of variables), but now I'm slightly confused, and I can't find a description of the mechanism that's making this happen.
Object#tap couldn't be simpler:
VALUE
rb_obj_tap(VALUE obj)
{
rb_yield(obj);
return obj;
}
(from the documentation). It just yields and then returns the receiver. A quick check in IRB shows that yield yields the object itself rather than a new object.
def foo
x = {}
yield x
x
end
foo { |y| y['key'] = :new_value }
# => {"key" => :new_value }
So the behavior of tap is consistent with yield, as we would hope.
tap does not duplicate the receiver. The block variable is assigned the very receiver itself. Then, tap returns the receiver. So when you do tap{|pipe| pipe['key']=newvalue}, the receiver of tap is modified. To my understanding,
x.tap{|x| foo(x)}
is equivalent to:
foo(x); x
and
y.tap{|y| y.bar}
is equivalent to:
y.bar; y

instance_eval's block argument(s)- documented? purpose?

Just realized that instance_eval yields self as an argument to the associated block (except for a bug in the 1.9.2 version: http://www.ruby-forum.com/topic/189422)
1.9.3p194 :003 > class C;end
1.9.3p194 :004 > C.new.instance_eval {|*a| a}
=> [#<C:0x00000001f99dd0>]
1.9.3p194 :005 >
Is this documented/spec'ed somewhere? Looking at ruby-doc:BasicObject, can't see any block params mentioned.
Is there a reason -apart from some purely historical one- for passing it explicitly when it self is always defined anyway?
The way I was hit by this is:
l = lambda { }
myobj.instance_eval(&l) # barks
This worked fine in 1.8.x (I guess because of block arity wasn't enforced).
Then upgraded to 1.9.2 - and it still worked! That's a strange coincidence as even though lambda block arguments are strictly enforced (so it would have complained for not declaring the argument for self), however due to the bug linked above - the self actually wasn't passed in this version..
Then upgraded to 1.9.3 where that bug got fixed, so it started to throwing the argument error - pretty surprising for a minor version change IMHO.
So one workaround is do declare parameter, or make lambda a block instead:
l = proc { }
myobj.instance_eval(&l) # fine
Just thought to describe the full story to help others avoid wasting time the way I did - until this is properly documented.
Reading Ruby's source code, what I can interpret is:
instance_eval is executing this:
return specific_eval(argc, argv, klass, self)
which in turn runs:
if (rb_block_given_p()) {
if (argc > 0) {
rb_raise(rb_eArgError, "wrong number of arguments (%d for 0)", argc);
}
return yield_under(klass, self, Qundef);
}
You can see they pass Qundef for the VALUES argument.
if (values == Qundef) {
return vm_yield_with_cref(th, 1, &self, cref);
}
In that particular line of code, they set manually argc (argument count) to 1 and the argument as "self". Later on the code that prepares the block sets the arguments to the block to these arguments, hence the first argument = "self" and the rest are nil.
The code that sets up the block arguments is doing :
arg0 = argv[0];
... bunch of code ...
else {
argv[0] = arg0;
}
for (i=argc; i<m; i++) {
argv[i] = Qnil;
}
Resulting in:
1.9.3p194 :006 > instance_eval do |x, y, z, a, b, c, d| x.class end
=> Object
1.9.3p194 :008 > instance_eval do |x, y, z, a, b, c, d| y.class end
=> NilClass
Why ? I have no idea but the code seems to be intentional. Would be nice to ask the question to the implementers and see what they have to say about it.
[Edit]
This probably is like that because the blocks you pass to instance_eval may or may not be crafted for it (code that depends on self being set to the class you want the block to modify), instead they may assume you are going to pass them the instance you want them to modify as an argument and in this way they would work with instance_eval as well.
irb(main):001:0> blk = Proc.new do |x| x.class end
#<Proc:0x007fd2018447b8#(irb):1>
irb(main):002:0> blk.call
NilClass
irb(main):003:0> instance_eval &blk
Object
Of course this is only a theory and without official documentation I can only guess.
I have just dicovered that unlike #instance_eval, which is primarily intended for string evaluation, #instance_exec primarily intended for block evaluation, does not have the described behavior:
o = Object.new
o.instance_exec { |*a| puts "a.size is #{a.size}" }
=> a.size is 0
This is probably an unintended inconsistency, so you might have discovered a bug. Post it on Ruby bugs.
I just asked the same question here: Ruby lambda's proc's and 'instance_eval'
And after reading the answer and working through some code, I think I understand why ruby has this strange (IMHO) inconsistency.
It basically allows Symbol#to_proc to work.
For example ["foo", "bar"].each(&:puts) is short for [...].each { |x| puts x }
NOT
[...].each { self.puts }
So ruby also passes self as the first param to the proc, so basically the proc can either use self or its first param.
Since instance eval does not by definition explicitly pass params this is almost always invisible behavior.
The exception is when the proc is a lambda. This DOES NOT WORK:
2.4.1 :015 > foo = -> { puts 'hi' }
=> #<Proc:0x007fcb578ece78#(irb):15 (lambda)>
2.4.1 :016 > [1, 2, 3].each(&foo)
ArgumentError: wrong number of arguments (given 1, expected 0)
from (irb):15:in `block in irb_binding'
from (irb):16:in `each'
from (irb):16
So I think the only time this becomes a problem is when instance_eval is being used with some unknown value, where you don't know if the proc is a lambda or not. In this case you have to do it like this:
proc_var.lambda? ? instance_exec(&proc_var) : instance_eval(&proc_var)
Weird (to me) that ruby just does not do this under the hood for you.
but I guess you could make it so:
alias original_instance_eval instance_eval
def instance_eval(*args, &block)
block&.lambda? ? instance_exec(&block) : original_instance_eval(*args, &block)
end

Resources