Ruby C extension: Is there a way to finalize? - ruby

I have been through all the documents on Ruby C extensions that I can find to no good end.
Is there a complement to the Init_... method of initializing a C extension that is called as the interpreter exits?

Ruby code can use Kernel#at_exit.
at_exit { puts "This code runs when Ruby exits." }
The implementation of Kernel#at_exit in eval_jump.c calls a C function, rb_set_end_proc(). This function is public, so you can call it from your own C code. The declaration is
void rb_set_end_proc(void (*)(VALUE), VALUE);
The first argument is a pointer to your C function (to get called when Ruby exits). The second argument is a Ruby value to pass to your C function.

There is no general "interpreter exiting" hook. But Ruby does garbage-collect everything on a normal exit, including Module and Class objects, and there is a way to hook object garbage collection. So you could adapt the following code that applies equally to Ruby interpreted objects or those defined by a C library:
module MyLib
end
ObjectSpace.define_finalizer( MyLib, proc { puts "MyLib unloaded" } )
You will need to take care to avoid assumptions that other Module or Class objects you expect to have available still exist when running this code, you are not in full control of the order in which this will get called on program exit.

Related

Ruby anonymous classes as first-class functions

Ruby does not have first class functions; although it has procs and lambdas, these notoriously require significant overhead. (Python has first class functions, apparently without the overhead.)
It occurred me to that first class functions can be simulated with a little more work by using anonymous classes, as follows:
f = Class.new { def self.f; puts 'hi'; end }
def g(fun); fun; end
g(f.f)
# => "hi"
Does anyone know a better way?
In fact, Ruby doesn't have functions at all, only methods. So if you want pass a method to another method, you can
def g(f)
f.call
end
g('123'.method(:to_i))
This is less concise than Python, but it's the price that Ruby has to pay for the ability to omit parentheses in method calls. I think omitting parentheses is one of the things that makes Ruby shine, because this makes implementing DSL in pure Ruby a lot easier.
Ruby has procs and lambdas (both instances of the Proc class), and Methods, all of which approximate first-class functions. Lambdas are the closest to a true first-class function: they check the number of arguments when called and create a new call context such that return just returns from the lambda. In contrast, procs are just reified blocks of code; they don't check their number of arguments, and a return causes the enclosing method to return, not just the proc.
Method objects allow you to store an uncalled method in a variable, complete with implied invocant. There's no syntax for creating an anonymous Method, but you said first-class functions, not anonymous ones. Other than the invocant, they are basically lambdas whose body is that of the referenced method.
I'm not sure what an anonymous class gets you that is better than the above solutions, but it is certainly further away from a true first-class function. It's more like the way we had to approximate them in Java before closures were added to the language.

Why does Pry format these return values differently?

In the first statement below, Pry returns a normal-looking object.
In the second, Pry specifies a lambda in the object, but also adds #(pry) with a reference to the line inside the Pry session (:37). Why doesn't the first return value contain #(pry)? Or, conversely, why does the second return value contain it?
{}.to_proc
# => #<Proc:0x9b3fed0>
lambda {}
# => #<Proc:0x97db9c4#(pry):37 (lambda)>
The second example is a literal, and the proc (lambda) is created there within Ruby code, where it gets the source location.
In the first example, the proc is created by executing a C method (to_proc). C code is compiled into Ruby interpreter, which becomes binary code, and it does not make sense to describe the C location in place of a Ruby source location. In fact, you will also not get the source location for the method (which is not the same as the "source location" of the proc it generates, but should be close to it, if they were to be given):
{}.method(:to_proc).source_location # => nil
However, if the source is written as part of Ruby code, you get the source location:
irb(main):001:0> def to_proc
irb(main):002:1> Proc.new{}
irb(main):003:1> end
=> :to_proc
irb(main):004:0> {}.to_proc
=> #<Proc:0x007f387602af70#(irb):2>
This doesn't have anything to do with Pry. This is what you get when you call inspect on these two Procs.
I'm not 100% sure, but I have a theory. In the second example, you're passing a block to lambda. Although you don't have any code inside the block, you ordinarily would, and when debugging (which is what inspect is ordinarily used for) line numbers are important.
In the first example, though, there's no block. You're calling Hash#to_proc on an empty Hash (which is irrelevant; you get the same result with Symbol#to_proc etc.), and so there's no code to associate a line number with; a line number wouldn't even really make sense.
You can see where this happens in the proc_to_s function in proc.c, by the way.

Understanding Ruby and Sinatra syntax

I have a need to decipher some Ruby code. Being a Python dev, I am having hard time making sense to some of the syntax.
I need to deal with some (mostly clean and readable) Sinatra code. I started with a Sinatra tutorial, and it looks something like this:
get '/' do
"Hello, World!"
end
Now, I know that in Ruby you don't need parentheses to call a function. So if I were to try to understand the above, I would say:
get is a function that takes as its first argument the route.
'/' is the first argument
do ... end block is an anonymous function
Please correct me if I am wrong above, and explain in detail anything I might be missing.
Also they say that Sinatra is a DSL -- does this mean that it is parsing some special syntax that is not official Ruby?
do ... end (or { ... }) is a block, a very important concept in Ruby. It was noticed that very often functions that take other functions as parameter (map, filter, grep, timeout...) very often accept a single function. So the Ruby designer decided to make a special syntax for it.
It is often said that in Ruby, everything is an object. This is not quite true: code is not an object. Code can be wrapped into an object. But Ruby blocks are pure code - not an object, not a first-order value at all. Blocks are a piece of code associated with a function call.
Your code snippet is equivalent to this:
self.get('/') do
return "Hello, World!"
end
The get method takes one parameter and a block; not two parameters. In a hypothetical example where get did take two parameters, we would have to write something like this:
get('/', lambda { "Hello, World" })
or
get('/', Proc.new { "Hello, World" })
but notice that the way we wrap code into objects involves calling methods lambda and Proc.new - and giving them a block (and zero parameters)!
There are many tutorials on "Ruby blocks", so I will not link any particular one.
Because of the block syntax, Ruby is very good at making dialects (still fully syntactic Ruby) that express certain concepts very neatly. Sinatra uses the get... "syntax" (but actually just a method call) to describe a web server; Rake uses task... "syntax" to describe build processes; RSpec, a testing framework, has its own DSL (that is still Ruby) that describes desired behaviours.
After some reading, I understood the code blocks.
Ruby code blocks are simple. They are 'closures'. There's two ways to write a block
do |x|
do_something(x)
end
{|x| do_something(x) }
The |x| is the argument that gets passed to the code within the block.
The crucial bit to grasping code blocks is to understand how they are used with methods.
In Ruby, methods are a bit different.
In addition to arguments, any method can accept a code block.
Code blocks are NOT arguments, but they are a separate entity that can be passed to a method along with arguments
A method can choose not to call the code block, in which case, any code block that was passed is ignored
If a method calls a code block, then it is necessary to pass it when calling the method, or otherwise Ruby will complain.
yield within a method executes the code block
For more on code blocks read this: http://mixandgo.com/blog/mastering-ruby-blocks-in-less-than-5-minutes

Ruby Blocks, Procs and Local Variables

In Ruby, procs seem to have access to local variables that were present at the time they were declared, even if they are executed in a different scope:
module Scope1
def self.scope1_method
puts "In scope1_method"
end
end
module Scope2
def self.get_proc
x = 42
Proc.new do
puts x
puts self
scope1_method
end
end
end
Scope1.instance_eval(&Scope2.get_proc)
Output:
42
Scope1
In scope1_method
How and why does this occur?
The Proc.new call creates a closure for the block that it's given. In creating a closure for the block, the block is bound to the original variables in the scope of the Proc.new call.
Why is this done?
It allows Ruby blocks to function as closures. Closures are extremely useful, and the Wikipedia entry (linked above) does an excellent job of explaining some of their applications.
How is this done?
This is done in the Ruby VM (in C code) by copying the Ruby control frame that exists before entering the Proc.new method. The block is then run in the context of this control frame. This effectively copies all of the bindings that are present in this frame. In Ruby 1.8, you can find the code for this in the proc_alloc function in eval.c. In Ruby 1.9, you can find this in the proc_new function in proc.c.
This behavior is by design. In Ruby, blocks, procs, and lambdas are lexical closures. Read this blog post for a short explanation of the differences between Ruby's three flavors of closure.

When does Ruby know that a method exists?

One question that ran through my mind was how does the Ruby interpreter know that a method exists on a object if the definition is yet to be interpreted? Like, wouldn't it matter whether you define the method first than use it, rather than use it then define it?
It doesn't know, and it doesn't care - until execution. When a method call statement is executed, the interpreter looks to see if the class (object, not code!) has the named function. If it does not, it looks up the ancestor tree. If it does not find any, it calls the method_missing method. If that is not defined, you get your error.
If your function call does not get executed, you will not get any errors.
The interpreter doesn't know about undefined methods ahead of time, for example:
o = Object.new
o.foo # => Raises NoMethodError.
class Object
def foo
puts "Foo!"
end
end
o.foo # => prints "Foo!", since the method is defined.
However, Ruby has a neat feature called method_missing which let's the receiver of a method call take the method name and arguments as separate arguments and handle accordingly as long as no defined method already handles the call.
def o.method_missing(sym, *args)
puts "OK: #{sym}(#{args.inspect})"
# Do something depending on the value of 'sym' and args...
end
o.bar(1, 2, 3) #=> OK: bar(1, 2, 3)
"Method missing" is used by things like active record find methods and other places where it could make sense to have "dynamically defined" functions.
The problem is, the interpreter tried to find it when you use it, and since it won't be there, it may fail.
In ( some ) compiled languages, it doesn't matter, because while compiling, the compiler may say "I'll look for this on a second pass" but I don't think this is the case with Ruby.

Resources