Ruby modules and Module#append_features explanation - ruby

Lately I bumped into this very interesting post: http://opensoul.org/blog/archives/2011/02/07/concerning-activesupportconcern/ which walks through (and explains) the ActiveSupport::Concern source code.
A few questions arose, but the most important was this:
Obviously there's a method called append_features which (by the docs at least) says: "Ruby’s default implementation of this method will add constants, methods, and variables of this module to the base module".
I always thought that module works the same as classes in the sense of the method lookup chain - the only difference is that you can't instantiate objects from it, and that it's not defined as a 'superclass' of this class (since a module is not actually a class). meaning that when a class includes a module, the module is simply being added as a direct parent in the class's inheritance hierarchy, and as a result, methods which are missing in the including class, will be looked for at the module.
But if that's the case, then what does it mean that append_features actually "adds methods to the base module", which means that you can actually prevent this behaviour, by overriding this method (which ActiveSupport::Concern actually does).
Can someone create some order in my head?

Basically, the append_features is - or should be considered - a deeply internal ruby method.
The Module.include method is defined (in the "eval.c" file with the name of rb_mod_include) as a loop, which just calls mod.append_features (and then mod.included) for every Module argument passed to it.
The default append_features implementation (rb_mod_append_features in "eval.c" file), calls the rb_include_module, and this is the method which does the real job.
(Actually the really real job is done by the include_modules_at few lines below)
It means that you are perfectly right saying that you can prevent or break this basic ruby functionality by overriding the append_features (at least if you don't call the super).
The ActiveSupport::Concern actually calls the super, just in some cases it postpones the actual call until the "concerned" module is included by some "non-concerned" one.
It's usually better to override the included method instead of append_features. The included is defined as just "return nil", thus the probability of breaking anything is smaller. And that is what the documentation of the included method advices.

Related

What are the *actual* steps in ruby's method lookup?

I've read stackoverflow posts on this topic as well as several articles which include A Primer on Ruby Method Lookup, What is the method lookup path in Ruby. In addition, I checked out the object model chapter in Ruby Metaprogramming 2, asked in a few chat rooms, and made this reddit thread. Short of learning C, I've done what I can to figure this out.
As described by the resources above, these 6 places are checked (in order) during method lookup on a receiving object like fido_instance:
singleton class of fido_instance
IClass (from an extended module)
IClass (from a prepended module)
class
IClass (from an included module)
superclass (if method isn't found here, repeat steps 4-6)
Obviously, the diagram is incomplete, and all of these singleton classes might not have been created in the real world. Still, those 6 steps leave a lot to be desired, and don't cover the following scenario. If there were no extended/prepended IClass above the singleton class of fido_instance, then there's no explanation of whether step 4 is executed on the singleton class of fido_instance. I have to assume not since the whole method lookup would short circuit.
If I were to guess a set of steps that could explain ruby's method lookup behavior, it might look like:
check fido_instance.class for the method. (obviously, ruby isn't going to use its own #class method to do the method lookup, but it conveys the logic of the process)
check fido_instance.class.superclass for the method. Keep adding .superclass and checking for the method until no superclasses are left. (again, ruby isn't going to use its own #superclass method)
method wasn't found. Start at step 1, looking for #method_missing this time.
I also recall reading that there's a separate method lookup process if the receiving object is a class, but I can't recall where.
So what's the correct, detailed explanation that doesn't involve knowing C?
There's a ... gem ... in that second ref that I think gets to the core of the answer: ancestors of the singleton class. Applied to your object, it would be:
fido_instance.singleton_class.ancestors
This will always give you the order of method lookup that Ruby uses. It's pretty simple when you view it this way, and that's the bottom line answer to your question. Ruby will start at the singleton_class and work its way up the ancestors looking for that method. Using your diagram:
fido.singleton_class.ancestors
=> [Fetch, WagTail, DogClass, Object, Kernel, BasicObject]
(Note1: Bark is not part of this output because you used extend instead of include. More on this in a second.)
(Note2: If it doesn't find it all the way up to BasicObject, then it will call method_missing up the same ancestry chain.)
It's no different when calling a method on a class, because in Ruby a class it just an instance of class Class. So DogClass.method1 will search for method1 on DogClass.singleton_class and then up its ancestry chain, just like before.
DogClass.singleton_class.ancestors
=> [Bark, Class, Module, Object, Kernel, BasicObject]
Since you used extend for Bark, this is where we find it! So if Bark defined a method bark, then you can call DogClass.bark because that method is defined in DogClass's singleton_class' ancestors.
To understand what that ancestry tree will be (instead of relying on printing it out every time), you simply need to know how the ancestry is modified by subclassing, extend, include, prepend, etc.
Subclassing gives the child class the entire ancestry chain of its superclass.
includeing a module in a class C adds that module into the ancestry chain after C and before everything else.
prepending a module in a class C adds that module into the ancestry chain before everything, including C and any currently prepended modules.
def x.method1 adds method1 to x.singleton_class. Similarly x.extend(M) will add M to the ancestry of x.singleton_class (but not to x.class). Note that the latter is exactly what happened with Bark and DogClass.singleton_class, but can equally apply to any object.
Leaving out extend from the above list because it does not modify the object's ancestry chain. It does modify the ancestry of that object's singleton_class -- as we saw, Bark was included in DogClass.singleton_class.ancestors.
Tangent:
The bit about class methods above is the key to me for understanding how important singleton classes are to Ruby. You obviously can't define bark on DogClass.class, because DogClass.class == Class and we don't want bark on Class! So how can we allow DogClass to be an instance of Class, allowing it to have a (class) method bark that is defined for DogClass but not unrelated classes? Using the singleton class! In this way, defining a "class method", like by def self.x inside class C, is sort of like C.singleton_class.send(:define_method, :x) {...}.

Why does the Ruby module Kernel exist?

In the book OO Design in Ruby, Sandi Metz says that the main use of modules is to implement duck types with them and include them in every class needed. Why is the Ruby Kernel a module included in Object? As far as I know it isn't used anywhere else. What's the point of using a module?
Ideally,
Methods in spirit (that are applicable to any object), that is, methods that make use of the receiver, should be defined on the Object class, while
Procedures (provided globally), that is, methods that ignore the receiver, should be collected in the Kernel module.
Kernel#puts, for example doesn't do anything with its receiver; it doesn't call private methods on it, it doesn't access any instance variables of it, it only acts on its arguments.
Procedures in Ruby are faked by using Ruby's feature that a receiver that is equal to self can be omitted. They are also often made private to prevent them from being called with an explicit receiver and thus being even more confusing. E.g., "Hello".puts would print a newline and nothing else since puts only cares about its arguments, not its receiver. By making it private, it can only be called as puts "Hello".
In reality, due to the long history of Ruby, that separation hasn't always been strictly followed. It is also additionally complicated by the fact that some Kernel methods are documented in Object and vice versa, and even further by the fact that when you define something which looks like a global procedure, and which by the above reasoning should then end up in Kernel, it actually ends up as a private instance method in Object.
As you already pointed out: Modules provide a way to collect and structure behavior, so does the Kernel module. This module is mixed in early into the class Object so every Ruby class will provide these methods. There is only a BasicObject before in hierarchy, it's child Objects purpose is only to get extended by the Kernel methods. BasicObject has only 7 methods that very very basic like new, __send__ or __id__.
class Object < BasicObject
include Kernel # all those many default methods we appreciate :)
end

Include or Extend When in the Main Scope

All over the Internet, I see people using "include" to bring new functionality to the main scope.
Most recenty, I saw this in an SO answer:
require 'fileutils' #I know, no underscore is not ruby-like
include FileUtils
# Gives you access (without prepending by 'FileUtils.') to
cd(dir, options)
cd(dir, options) {|dir| .... }
pwd()
Extend works too, but from my understanding, ONLY extend should work. I don't even know why the main object has the include functionality.
Inside the main scope:
self.class == Object #true
Object.new.include #NoMethodError: undefined method `include' for #<Object:0x000000022c66c0>
An I mean logically, since self.is_a?(Module) == false when inside the main scope, main shouldn't even have the include functionality, since include is used to add methods to child instances and main isn't a class or a module so there are no child instances to speak of.
My question is, why does include work in the main scope, what was the design decision that led to making it work there, and lastly, shouldn't we prefer "extend" in that case so as to not make the "extend/include" functionality even more confusing than it already may be thanks to people often using ruby hooks to invoke the other when one is called.
The main scope is "special". Instance methods defined at the main scope become private instance methods of Object, constants become constants of Object, and include includes into Object. There's probably other things I'm missing. (E.g. presumably, prepend prepends to Object, I never thought about it until now.)

Ruby and some brain teaser

At the office, we had this little brain teaser:
class Bicycle
def spares
{tire_size: 21}.merge(local_spares)
end
def local_spares
{}
end
end
class RoadBike < Bicycle
def local_spares
{tape_color: 'red'}
end
end
roadbike = RoadBike.new
roadbike.spares
Most of us didn't get what roadbike.spares output is unless we ran the whole code in the console. We had our different hunch on the behaviour but can anyone break it down to me what really happened here?
If anyone's wondering, the output is {tire_size: 21, tape_color: 'red'}
It's quite obvious, RoadBike#spares (which is the same as Bicycle#spares, because RoadBike doesn't override this method) calls internally RoadBike#local_spares, merges its returned value to {tire_size: 21} hash and returns the result.
No surprise at all.
This is called method overriding. The RoadBike#local_spares method overrides the Bicycle#local_spares method because RoadBike inherits from Bicycle.
When you send a message to an object, Ruby will try to find a method with the same name as the message to execute. It first looks at the object's class, then at that class's superclass, then at that class's superclass and so on.
When you send a RoadBike object the spares message, it will first try (and fail) to find a method named spares in RoadBike. Then it will look into its superclass (Bicycle) and succeed.
The body of that method contains a message send of local_spares to the receiver object. Again, Ruby tries to find a method named local_spares in the class of the object (still RoadBike) and succeeds, so it executes that method.
This is all just standard inheritance and method overriding. There's nothing really special or surprising or "brain teaserish" about that. In fact, this is pretty much the whole point of inheritance and method overriding: that more specialized objects can provide more specialized implementations than their more general parents.
Note: the method lookup algorithm is in reality a bit more involved than that.
First off, what happens if there is no more superclass, and the method still hasn't been found? In that case, Ruby will send the message method_missing to the receiver and pass along the name of the method that it tried to look up. Only if the method_missing method also can't be found, will Ruby raise a NoMethodError.
And secondly, there are singleton classes, included modules, prepended modules, and Refinements to consider. So, really Ruby will look at the object's singleton class first, before it looks at the class and then the superclass and so on. And "look at the class" actually means look first at the prepended modules (in reverse order), then at the class itself, and then at the included modules (again in reverse order). Oh, and that has to be done recursively as well, so for each prepended module look first at the prepended modules of that module, then at the module itself, then the included modules and so forth.
(Oh, and Refinements throw a another wrinkle in this, obviously.)
Most of the Ruby implementations simplify this algorithm greatly by separating their internal notion of what a "class" is from the programmer's notion, by introducing the concept of "hidden classes" (YARV calls them "virtual classes") that exist inside the implementation but aren't exposed to the programmer. So, for example, the singleton class of an object would be a hidden class, and the class pointer of the object would simply point to the singleton class and the superclass pointer of the singleton class would point to the actual class of the object. When you include a module into a class, the implementation will synthesize a hidden class (which YARV calls an "include class") for the module and insert it as the superclass of the class and make the former superclass the superclass of the hidden class. Methods like Object#class and Class#superclass would then simply follow the superclass chain until it finds the first non-hidden class and return that, instead of returning the class/superclass pointer directly.
This makes methods like Object#class, Class#superclass and Module#ancestors slightly more complex, because they have to skip hidden classes, but it simplifies the method lookup algorithm, which is one of the most important performance bottlenecks in any object-oriented system.

What are some good examples of Mixins and or Traits?

I was reading up on Ruby, and learned about its mixins pattern, but couldn't think of many useful mixin functionality (because I'm not used to thinking that way most likely). So I was wondering what would be good examples of useful Mixin functionality?
Thanks
Edit: A bit of background. I'm Coming from C++, and other Object languages, but my doubt here is that Ruby says it's not inheriting mixins, but I keep seeing mixins as Multiple inheritance, so I fear I'm trying to categorize them too soon into my comfort zone, and not really grok what a mixin is.
They are usually used to add some form of standard functionality to a class, without having to redefine it all. You can probably think of them a bit like interfaces in Java, but instead of just defining a list of methods that need to be implemented, many of them will actually be implemented by including the module.
There are a few examples in the standard library:
Singleton - A module that can be mixed into any class to make it a singleton. The initialize method is made private, and an instance method added, which ensures that there is only ever one instance of that class in your application.
Comparable - If you include this module in a class, defining the <=> method, which compares the current instance with another object and says which is greater, is enough to provide <, <=, ==, >=, >, and between? methods.
Enumerable - By mixing in this module, and defining an each method, you get support for all the other related methods such as collect, inject, select, and reject. If it's also got the <=> method, then it will also support sort, min, and max.
DataMapper is also an interesting example of what can be done with a simple include statement, taking a standard class, and adding the ability to persist it to a data store.
Well the usual example I think is Persistence
module Persistence
def load sFileName
puts "load code to read #{sFileName} contents into my_data"
end
def save sFileName
puts "Uber code to persist #{#my_data} to #{sFileName}"
end
end
class BrandNewClass
include Persistence
attr :my_data
def data=(someData)
#my_data = someData
end
end
b = BrandNewClass.new
b.data = "My pwd"
b.save "MyFile.secret"
b.load "MyFile.secret"
Imagine the module is written by a Ruby ninja, which persists the state of your class to a file.
Now suppose I write a brand new class, I can reuse the functionality of persistence by mixing it in by saying include ModuleILike. You can even include modules at runtime. I get load and save methods for free by just mixing it in. These methods are just like the ones that you wrote yourself for your class. Code/Behavior/Functionality-reuse without inheritance!
So what you're doing is including methods to the method table for your class (not literally correct but close).
In ruby, the reason that Mixins aren't multiple-inheritance is that combining mixin methods is a one time thing. This wouldn't be such a big issue, except that Ruby's modules and classes are open to modification. This means that if you mixin a module to your class, then add a method to the module, the method will not be available to your class; where if you did it in the opposite order, it would.
It's like ordering an ice-cream cone. If you get chocolate sprinkles and toffee bits as your mixins, and walk away with your cone, what kind of ice cream cone you have won't change if someone adds multicolored sprinkles to the chocolate sprinkles bin back at the ice-cream shop. Your class, the ice cream cone, isn't modified when the mixin module, the bin of sprinkles is. The next person to use that mixin module will see the changes.
When you include a module in ruby, it calls Module#append_features on that module, which add a copy of that module's methods to the includer one time.
Multiple inheritance, as I understand it, is more like delegation. If your class doesn't know how to do something, it asks its parents. In an open-class environment, a class's parents may have been modified after the class was created.
It's like a RL parent-child relationship. Your mother might have learned how to juggle after you were born, but if someone asks you to juggle and you ask her to either: show you how (copy it when you need it) or do it for you (pure delegation), then she'll be able at that point, even though you were created before her ability to juggle was.
It's possible that you could modify a ruby module 'include' to act more like multiple inheritance by modifying Module#append_features to keep a list of includers, and then to update them using the method_added callback, but this would be a big shift from standard Ruby, and could cause major issues when working with others code. You might be better creating a Module#inherit method that called include and handled delegation as well.
As for a real world example, Enumerable is awesome. If you define #each and include Enumerable in your class, then that gives you access to a whole host of iterators, without you having to code each and every one.
It is largely used as one might use multiple inheritance in C++ or implementing interfaces in Java/C#. I'm not sure where your experience lies, but if you have done those things before, mixins are how you would do them in Ruby. It's a systemized way of injecting functionality into classes.

Resources