Understanding Ruby define_method with initialize - ruby

So, I'm currently learning about metaprogramming in Ruby and I want to fully understand what is happening behind the scenes.
I followed a tutorial where I included some of the methods in my own small project, an importer for CSV files and I have difficulties to wrap my hand around one of the methods used.
I know that the define_method method in Ruby exists to create methods "on the fly", which is great. Now, in the tutorial the method initialize to instantiate an object from a class is defined with this method, so basically it looks like this:
class Foo
def self.define_initialize(attributes)
define_method(:initialize) do |*args|
attributes.zip(args) do |attribute, value|
instance_variable_set("##{attribute}", value)
end
end
end
end
Next, in an initializer of the other class first this method is called with Foo.define_initialize(attributes), where attributes are the header row from the CSV file like ["attr_1", "attr_2", ...], so the *args are not provided yet.
Then in the next step a loop loops over the the data:
#foos = data[1..-1].map do |d|
Foo.new(*d)
end
So here the *d get passed as the *args to the initialize method respectively to the block.
So, is it right that when Foo.define_initialize gets called, the method is just "built" for later calls to the class?
So I theoretically get a class which now has this method like:
def initialize(*args)
... do stuff
end
Because otherwise, it had to throw an exception like "missing arguments" or something - so, in other words, it just defines the method like the name implies.
I hope that I made my question clear enough, cause as a Rails developer coming from the "Rails magic" I would really like to understand what is happening behind the scenes in some cases :).
Thanks for any helpful reply!

Short answer, yes, long answer:
First, let's start explaining in a really (REALLY) simple way, how metaprogramming works on Ruby. In Ruby, the definition of anything is never close, that means that you can add, update, or delete the behavior of anything (really, almost anything) at any moment. So, if you want to add a method to Object class, you are allowed, same for delete or update.
In your example, you are doing nothing more than update or create the initialize method of a given class. Note that initialize is not mandatory, because ruby builds a default "blank" one for you if you didn't create one. You may think, "what happens if the initialize method already exist?" and the answer is "nothing". I mean, ruby is going to rewrite the initialize method again, and new Foo.new calls are going to call the new initialize.

Related

Ruby: understanding data structure

Most of the Factorybot factories are like:
FactoryBot.define do
factory :product do
association :shop
title { 'Green t-shirt' }
price { 10.10 }
end
end
It seems that inside the ":product" block we are building a data structure, but it's not the typical hashmap, the "keys" are not declared through symbols and commas aren't used.
So my question is: what kind of data structure is this? and how it works?
How declaring "association" inside the block doesn't trigger a:
NameError: undefined local variable or method `association'
when this would happen on many other situations. Is there a subject in compsci related to this?
The block is not a data structure, it's code. association and friends are all method calls, probably being intercepted by method_missing. Here's an example using that same technique to build a regular hash:
class BlockHash < Hash
def method_missing(key, value=nil)
if value.nil?
return self[key]
else
self[key] = value
end
end
def initialize(&block)
self.instance_eval(&block)
end
end
With which you can do this:
h = BlockHash.new do
foo 'bar'
baz :zoo
end
h
#=> {:foo=>"bar", :baz=>:zoo}
h.foo
#=> "bar"
h.baz
#=> :zoo
I have not worked with FactoryBot so I'm going to make some assumptions based on other libraries I've worked with. Milage may vary.
The basics:
FactoryBot is a class (Obviously)
define is a static method in FactoryBot (I'm going to assume I still haven't lost you ;) ).
Define takes a block which is pretty standard stuff in ruby.
But here's where things get interesting.
Typically when a block is executed it has a closure relative to where it was declared. This can be changed in most languages but ruby makes it super easy. instance_eval(block) will do the trick. That means you can have access to methods in the block that weren't available outside the block.
factory on line 2 is just such a method. You didn't declare it, but the block it's running in isn't being executed with a standard scope. Instead your block is being immediately passed to FactoryBot which passes it to a inner class named DSL which instance_evals the block so its own factory method will be run.
line 3-5 don't work that way since you can have an arbitrary name there.
ruby has several ways to handle missing methods but the most straightforward is method_missing. method_missing is an overridable hook that any class can define that tells ruby what to do when somebody calls a method that doesn't exist.
Here it's checking to see if it can parse the name as an attribute name and use the parameters or block to define an attribute or declare an association. It sounds more complicated than it is. Typically in this situation I would use define_method, define_singleton_method, instance_variable_set etc... to dynamically create and control the underlying classes.
I hope that helps. You don't need to know this to use the library the developers made a domain specific language so people wouldn't have to think about this stuff, but stay curious and keep growing.

RSpec test method is called on `main` object

Sometimes we call methods on the ruby main objects. For example we call create for FactoryBot and we call _() for I18n.
What's a proper way to test these top level methods got called in RSpec?
For example, I want to test N_ is called, but it would not work because the self in Rspec and self in the file are different.
# spec
describe 'unfound_translations' do
it 'includes dynamic translations' do
expect(self).to receive(:N_)
load '/path/to/unfound_translations.rb')
end
end
# unfound_translations.rb
N_('foo')
However this does not pass.
Ok, I get your problem now. Your main issue is that self in it block is different that self inside unfound_translations.rb. So you're setting expectations on one object and method N_ is called on something completely different.
(Edit: I just realized, when reading the subject of this question again, that you already was aware of it. Sorry for stating the obvious... leaving it so it may be useful to others)
I managed to have a hacky way that is working, here it is:
# missing_translations.rb
N_('foo')
and the spec (I defined a simple module for tests inside it for simplicity):
module N
def N_(what)
puts what
end
end
RSpec.describe 'foo' do
let(:klass) do
Class.new do
extend N
end
end
it do
expect(klass).to receive(:N_)
klass.class_eval do
eval(File.read('missing_translations.rb'))
end
end
end
What it does it's creating an anonymous class that. And evaluating contents of missing_translations.rb inside means that klass is the thing that receives N_ method. So you can set expectations there.
I'm pretty sure you can replace extend N module with whatever module is giving you N_ method and this should work.
It's hacky, but not much effort so maybe good enough until more elegant solution is provided.

Weird Ruby class initialization logic?

Some open source code I'm integrating in my application has some classes that include code to that effect:
class SomeClass < SomeParentClass
def self.new(options = {})
super().tap { |o|
# do something with `o` according to `options`
}
end
def initialize(options = {})
# initialize some data according to `options`
end
end
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction", and it looks to me like a horrible pattern to use - why split up the object initialization into two parts where one is obviously "The Wrong Think(tm)"?
Ideally, I'd like to see what is inside the super().tap { |o| block, because although this looks like bad practice, just maybe there is some interaction required before or after initialize is called.
Without context, it is possible that you are just looking at something that works but is not considered good practice in Ruby.
However, maybe the approach of separate self.new and initialize methods allows the framework designer to implement a subclass-able part of the framework and still ensure setup required for the framework is completed without slightly awkward documentation that requires a specific use of super(). It would be a slightly easier to document and cleaner-looking API if the end user gets functionality they expect with just the subclass class MyClass < FrameworkClass and without some additional note like:
When you implement the subclass initialize, remember to put super at the start, otherwise the magic won't work
. . . personally I'd find that design questionable, but I think there would at least be a clear motivation.
There might be deeper Ruby language reasons to have code run in a custom self.new block - for instance it may allow constructor to switch or alter the specific object (even returning an object of a different class) before returning it. However, I have very rarely seen such things done in practice, there is nearly always some other way of achieving the goals of such code without customising new.
Examples of custom/different Class.new methods raised in the comments:
Struct.new which can optionally take a class name and return objects of that dynamically created class.
In-table inheritance for ActiveRecord, which allows end user to load an object of unknown class from a table and receive the right object.
The latter one could possibly be avoided with a different ORM design for inheritance (although all such schemes have pros/cons).
The first one (Structs) is core to the language, so has to work like that now (although the designers could have chosen a different method name).
It's impossible to tell why that code is there without seeing the rest of the code.
However, there is something in your question I want to address:
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction"
They do not do the same thing.
Object construction in Ruby is performed in two steps: Class#allocate allocates a new empty object from the object space and sets its internal class pointer to self. Then, you initialize the empty object with some default values. Customarily, this initialization is performed by a method called initialize, but that is just a convention; the method can be called anything you like.
There is an additional helper method called Class#new which does nothing but perform the two steps in sequence, for the programmer's convenience:
class Class
def new(*args, &block)
obj = allocate
obj.send(:initialize, *args, &block)
obj
end
def allocate
obj = __MagicVM__.__allocate_an_empty_object_from_the_object_space__
obj.__set_internal_class_pointer__(self)
obj
end
end
class BasicObject
private def initialize(*) end
end
The constructor new has to be a class method since you start from where there is no instance; you can't be calling that method on a particular instance. On the other hand, an initialization routine initialize is better defined as an instance method because you want to do something specifically with a certain instance. Hence, Ruby is designed to internally call the instance method initialize on a new instance right after its creation by the class method new.

A better way to call methods on an instance

My question has a couple layers to it so please bear with me? I built a module that adds workflows from the Workflow gem to an instance, when you call a method on that instance. It has to be able to receive the description as a Hash or some basic data structure and then turn that into something that puts the described workflow onto the class, at run-time. So everything has to happen at run-time. It's a bit complex to explain what all the crazy requirements are for but it's still a good question, I hope. Anyways, The best I can do to be brief for a context, here, is this:
Build a class and include this module I built.
Create an instance of Your class.
Call the inject_workflow(some_workflow_description) method on the instance. It all must be dynamic.
The tricky part for me is that when I use public_send() or eval() or exec(), I still have to send some nested method calls and it seems like they use 2 different scopes, the class' and Workflow's (the gem). When someone uses the Workflow gem, they hand write these method calls in their class so it scopes everything correctly. The gem gets to have access to the class it creates methods on. The way I'm trying to do it, the user doesn't hand write the methods on the class, they get added to the class via the method shown here. So I wasn't able to get it to work using blocks because I have to do nested block calls e.g.
workflow() do # first method call
# first nested method call. can't access my scope from here
state(:state_name) do
# second nested method call. can't access my scope
event(:event_name, transitions_to: :transition_to_state)
end
end
One of the things I'm trying to do is call the Workflow#state() method n number of times, while nesting the Workflow#event(with, custom_params) 0..n times. The problem for me seems to be that I can't get the right scope when I nest the methods like that.
It works just like I'd like it to (I think...) but I'm not too sure I hit the best implementation. In fact, I think I'll probably get some strong words for what I've done. I tried using public_send() and every other thing I could find to avoid using class_eval() to no avail.
Whenever I attempted to use one of the "better" methods, I couldn't quite get the scope right and sometimes, I was invoking methods on the wrong object, altogether. So I think this is where I need the help, yeah?
This is what a few of the attempts were going for but this is more pseudo-code because I could never get this version or any like it to fly.
# Call this as soon as you can, after .new()
def inject_workflow(description)
public_send :workflow do
description[:workflow][:states].each do |state|
state.map do |name, event|
public_send name.to_sym do # nested call occurs in Workflow gem
# nested call occurs in Workflow gem
public_send :event, event[:name], transitions_to: event[:transitions_to]
end
end
end
end
end
From what I was trying, all these kinds of attempts ended up in the same result, which was my scope isn't what I need because I'm evaluating code in the Workflow gem, not in the module or user's class.
Anyways, here's my implementation. I would really appreciate it if someone could point me in the right direction!
module WorkflowFactory
# ...
def inject_workflow(description)
# Build up an array of strings that will be used to create exactly what
# you would hand-write in your class, if you wanted to use the gem.
description_string_builder = ['include Workflow', 'workflow do']
description[:workflow][:states].each do |state|
state.map do |name, state_description|
if state_description.nil? # if this is a final state...
description_string_builder << "state :#{name}"
else # because it is not a final state, add event information too.
description_string_builder.concat([
"state :#{name} do",
"event :#{state_description[:event]}, transitions_to: :#{state_description[:transitions_to]}",
"end"
])
end
end
end
description_string_builder << "end\n"
begin
# Use class_eval to run that workflow specification by
# passing it off to the workflow gem, just like you would when you use
# the gem normally. I'm pretty sure this is where everyone's head pops...
self.class.class_eval(description_string_builder.join("\n"))
define_singleton_method(:has_workflow?) { true }
rescue Exception => e
define_singleton_method(:has_workflow?) { !!(puts e.backtrace) }
end
end
end
end
# This is the class in question.
class Job
include WorkflowFactory
# ... some interesting code for your class goes here
def next!
current_state.events.#somehow choose the correct event
end
end
# and in some other place where you want your "job" to be able to use a workflow, you have something like this...
job = Job.new
job.done?
# => false
until job.done? do job.next! end
# progresses through the workflow and manages its own state awareness
I started this question off under 300000 lines of text, I swear. Thanks for hanging in there! Here's even more documentation, if you're not asleep yet.
module in my gem

The name of the class that "newed" my class

In Ruby, is there a way to get the name of the class that creates an instance of MyClass?
I know that I could pass it in as an argument on my initialize method, but I want to know if any data is already there that has to do with the class that created an instance of MyClass in side of MyClass.
So it would be like
class MyClass
def initialize
#who_called_me = who_called_me.name
end
def who_called_me
puts #who_called_me
end
end
Although this is not portable between implementations and versions, this is a crude solution:
who_made_me=caller[3].split(':')[1][4..-2]
What this does is it gets the current stack, skips the strings for initialize, allocate, and new, and then gets the method name out of the string. Again, this is a total hack, and is based around unspecified behavior. I would not suggest using this unless absolutely necessary.
In general, this is evil. I've seen an equivalent in C#, but it produced violently cruel side effects, not to mention ugly-as-heck code.
In Ruby, if you really had to do this, you'd probably start with Kernel.caller. But please don't do that.

Resources