How proc is executed when passed to `instance_exec` - ruby

The question is inspired by this one.
Proc::new has an option to be called without a block inside a method:
Proc::new may be called without a block only within a method with an attached block, in which case that block is converted to the Proc object.
When the proc/lambda instance is passed as a code block, the new instance of Proc is being created:
Proc.singleton_class.prepend(Module.new do
def new(*args, &cb)
puts "PROC #{[block_given?, cb, *args].inspect}"
super
end
end)
Proc.prepend(Module.new do
def initialize(*args, &cb)
puts "INIT #{[block_given?, cb, *args].inspect}"
super
end
def call(*args, &cb)
puts "CALL #{[block_given?, cb, *args].inspect}"
super
end
end)
λ = ->(*args) { }
[1].each &λ
#⇒ [1]
As one might see, neither the call to Proc::new happened, nor Proc#initialize and/or Proc#call were called.
The question is: how ruby creates and executes a block wrapper under the hood?
NB Don’t test the code above in pry/irb console: they known to have glitches with pure execution of this, basically because they patch procs.

There has been some discussion of this behavior on the Ruby Issue Tracker, see Feature #10499: Eliminate implicit magic in Proc.new and Kernel#proc.
This is an implementation artifact of YARV: YARV pushes a block on the global VM stack, and Proc::new simply creates a Proc from the topmost block on the stack. So, if you happen to call Proc.new from within a method which was called with a block, it will happily grab whatever block is on top of the stack, without ever checking where it came from. Somehow, somewhere, in the mist of time, this (let's call it) "accidental artifact" (I'd actually rather call it a bug) became a documented feature. A feature that the developers of JRuby (and presumably Rubinius, Opal, MagLev, etc.) would rather get rid of.
Since most other implementations work completely differently, this behavior which comes "for free" on YARV, makes both blocks and Proc::new pontetially more expensive on other implementations and prohibits possible optimizations (which doesn't hurt on YARV, because YARV doesn't optimize).

Related

When to use implicit or explicit code blocks

I'm trying to understand when one should code blocks implicitly or explicitly. Given the following code blocks:
Implicit
def two_times_implicit
return "No block" unless block_given?
yield
yield
end
puts two_times_implicit { print "Hello "}
puts two_times_implicit
Explicit
def two_times_explicit (&i_am_a_block)
return "No block" if i_am_a_block.nil?
i_am_a_block.call
i_am_a_block.call
end
puts two_times_explicit { puts "Hello"}
puts two_times_explicit
Is it preferable to code using one over the other? Is there a standard practice and are there instances where one would work better or differently than the other and where one would not work at all?
Receiving a block via & creates a new proc object out of the block, so from the point of view of efficiency, it is better not to use it. However, using & generally makes it easier to define methods that may or may not take a block, and using &, you can also handle blocks together with arguments, so it is preferred by many.
Actually, according to one very interesting read, second variant is 439% slower (related thread on HackerNews).
TL;DR: Creating and passing a block via yield is a highly optimized common case in MRI, which is handled by dedicated C function in interpreter, while passing &block is implemented differently and has a big overhead of creating new environment and creating Proc itself on every call.
Summing up, use &block only if you need passing it further (for example, to a next function), or manipulate it somehow in other way. Otherwise, use yield, since it's way faster.

In Ruby, how do sub, gsub (and other text methods) in shell one-liners work without referring to an object?

I saw this piece of code somewhere on the web:
ruby -pe 'gsub /^\s*|\s*$/, ""'
Evidently this piece of code removes leading and trailing whitespace from each line from STDIN.
I understand the regex and replacement, no problem, but what I don't get is how the method gsub is receiving an object to act upon. I understand that the -p flag wraps this whole thing in a while gets; print; ... ; end block, but how does gsub receive the string to act upon? At the very least, shouldn't it be a $_.gsub(..) instead? How does the current input line get "magically" passed to gsub?
Does the code in these Perl-like one-liners get interpreted in a somewhat different manner? I'm looking for a general idea of the differences from traditional, script-based Ruby code. Haven't found a comprehensive set of resources on this, I'm afraid.
It turns out that this is an instance method defined on Kernel, which magically gets turned on only when you use the -p or -n flag.
ruby -pe 'puts method(:gsub);'
#<Method: Object(Kernel)#gsub>
See the documentation here.
Other magical methods I found are chop, print, and sub.
The magical methods are all sent to $_ implicitly.
Easy:
class Object
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Since every object is an instance of Object (well, almost every object), every object has a gsub method now. So, you can call
some_object.gsub('foo', 'bar')
on any object, and it will just work. And since it doesn't matter what object you call it on, because it doesn't actually do anything with that object, you might just as well call it on self:
self.gsub('foo', 'bar')
Of course, since self is the implicit receiver, this is the same as
gsub('foo', 'bar')
For methods such as this, which don't actually depend on the receiver, and are only added to the Object class for convenience reasons, it is a common convention to make them private so that you cannot accidentally call them with an explicit receiver and then somehow get confused into thinking that this method does something to the receiver.
Also, it is common to put such methods (which are actually intended to be used more like procedures than methods, i.e. completely independent of their receiver) into the Kernel mixin, which is mixed into Object instead of directly into the Object class to distinguish them from methods that are available to every object but actually do depend on its internal state, such as Object#class, Object#to_s etc.
module Kernel
private
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Other methods that are defined in this way, which you may have come across already are require, load, puts, print, p, gets, loop, raise, rand, throw, catch, lambda, proc, eval, Array, Integer, Float etc.

Ruby Blocks, Procs and Local Variables

In Ruby, procs seem to have access to local variables that were present at the time they were declared, even if they are executed in a different scope:
module Scope1
def self.scope1_method
puts "In scope1_method"
end
end
module Scope2
def self.get_proc
x = 42
Proc.new do
puts x
puts self
scope1_method
end
end
end
Scope1.instance_eval(&Scope2.get_proc)
Output:
42
Scope1
In scope1_method
How and why does this occur?
The Proc.new call creates a closure for the block that it's given. In creating a closure for the block, the block is bound to the original variables in the scope of the Proc.new call.
Why is this done?
It allows Ruby blocks to function as closures. Closures are extremely useful, and the Wikipedia entry (linked above) does an excellent job of explaining some of their applications.
How is this done?
This is done in the Ruby VM (in C code) by copying the Ruby control frame that exists before entering the Proc.new method. The block is then run in the context of this control frame. This effectively copies all of the bindings that are present in this frame. In Ruby 1.8, you can find the code for this in the proc_alloc function in eval.c. In Ruby 1.9, you can find this in the proc_new function in proc.c.
This behavior is by design. In Ruby, blocks, procs, and lambdas are lexical closures. Read this blog post for a short explanation of the differences between Ruby's three flavors of closure.

Finalizer not called before second object is created except when using weakref

I was playing around with ruby finalizers and noticed some behaviour that is very strange to me. I could reduce the triggering code to the following:
require "weakref"
class Foo
def initialize
ObjectSpace.define_finalizer(self, self.class.finalize)
end
def self.finalize
proc {
puts "finalizing"
}
end
end
Foo.new # does not work
#WeakRef.new(foo) # Using this instead, everything works as expected
sleep 1
ObjectSpace.garbage_collect
puts "... this did not finalize the object"
Foo.new
ObjectSpace.garbage_collect
puts "but this did?"
As the program says, no finalizer is run before the second call to Foo.new. I tried adding more delay before the first call to the garbage collector (though as I understand, it shouldn't be neccessary at all), but that doesn't do anything.
Strangely enough, if I use the commented-out line i, the first finalizer gets called as I would expect it to be. The second one is still not called before the program exits.
Can anyone explain why this is happening? I am running Ubuntu 12.10 with ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]. I tried reading the weakref code, but as far as I can tell, all it does is storing the objects object_id to retrieve it later.
edit:
I understand that manually invoking the garbage collector in a situation like this does not make sense. I'm just trying to understand the mechanics behind this.
You can't collect your Foo reference because it is referenced in your finalizer! Thus, because the finalizer itself is holding a reference to the object, the GC never collects it, and thus never triggers the finalizer. You can get around this by just using a WeakRef for the finalizer itself:
require "weakref"
class Foo
class << self
attr_accessor :objects_finalized
def finalize
proc {
#objects_finalized ||= 0
#objects_finalized += 1
}
end
end
def initialize
ObjectSpace.define_finalizer WeakRef.new(self), self.class.finalize
end
end
describe Foo do
it "should be collected" do
Foo.new
expect { GC.start }.to change {
ObjectSpace.each_object(Foo){} }.from(1).to(0)
end
it "should be finalized when it is collected" do
expect { begin; Foo.new; end; GC.start }.to change {
Foo.objects_finalized }.from(nil).to(1)
end
end
With results:
% rspec weakref.rb
..
Finished in 0.03322 seconds
2 examples, 0 failures
I found the answer on http://edwinmeyer.com/Release_Integrated_RHG_09_10_2008/chapter05.html (search for "Registers and the Stack")
Because a reference to the object is still stored in a processor register, the garbage collector stays safe and assumes it's still alive.
Remember, unlike in languages like Objective-C or C++, where as soon as all references to an object are gone it disappears, Ruby is a garbage collected language. The interpreter has no reason to invoke the bulky inefficient garbage collector for one object. When the garbage collector runs, all other processing stops. That's a big performance hit. The interpreter is smart enough to wait until most of the garbage is out before collecting.
Example: Do you take a trash bag with one piece of garbage to the dumpster? No. You wait until it's full and then go.
If you want to force a GC collection, try GC.garbage_collect to manually invoke the collector. Don't use this in production though unless you have a very good reason.

Equivalent of Scheme's dynamic-wind in Ruby

Ruby has continuations... does it have a dynamic-wind construct like Scheme?
[This answer is written with Scheme programmers in mind (the OP has asked other Scheme questions here before, so that's a safe bet). If you're here because you're a Ruby programmer with no Scheme background, read the footnote for some context. :-)]
MRI doesn't (see below); and if MRI doesn't, that means there is no portable way to use any such functionality even if another implementation provides it.
I actually inspected the MRI 1.9.1 source code, just to be sure. In any case, here is some code to demonstrate that even the normal unwind protection (ensure) doesn't work correctly with continuations on MRI (tested with both 1.8.7 and 1.9.1). (It does work correctly with JRuby (I tested with 1.5), so it goes to show it's an implementation-specific thing. But note that JRuby only provides escape continuations, not general-purpose ones.)
callcc do |cc|
begin
puts 'Body'
cc.call
ensure
puts 'Ensure'
end
end
(To test with MRI 1.9+, you need to either run with the -rcontinuation option, or put require 'continuation' at the top of the file.)
For readers who don't know what a dynamic-wind is, it's a way to specify code to be run when the code being covered is exited (much like ensure), as well as code to to be run when the covered code is re-entered. (This can happen when you use call/cc inside the covered code, and invoke the continuation object after the covered code has been exited.)
Totally contrived example:
def dynamic_wind pre, post, &block
raise 'Replace this with a real implementation, kthx'
end
def redirect_stdout port, &block
saved = $stdout
set_port = lambda {$stdout = port}
reset_port = lambda {$stdout = saved}
dynamic_wind set_port, reset_port, &block
end
cc = nil
# cheap way to nuke all the output ;-)
File.open '/dev/null' do |null|
redirect_stdout null do
callcc {|cc|}
puts 'This should not be shown'
end
puts 'This should be shown'
cc.call
end
So, a properly-functioning dynamic_wind implementation would ensure that $stdout would be set back to the /dev/null stream when the continuation is invoked, so that at all instances where puts 'This should not be shown' is run, that text is indeed not shown.

Resources