Using `extend self` and `class << self` in Ruby Modules

Using `extend self` and `class << self` in Ruby Modules - ruby

I sometimes write modules that will only contain module methods (as opposed to module instance methods) (are there better names for these?). These modules should not be included in classes because that would have no effect and be misleading to a reader. So I'd like it to be as clear as possible to the reader that these modules contain no instance methods.
If I define all methods with .self, then a reader has to inspect all methods to ensure that this module contains no instance methods. If I instead use class << self or extend self then it is automatic; as soon as the reader sees this, they know.
I think extend self is best becuase with class << self one has to find its corresponding end; that is, it may not apply to all methods in the module.
So is it a good idea, and a best practice, to use extend self in cases like this?
Also, is there any difference at runtime between enclosing all methods in class << self as opposed to using extend self?

I sometimes write modules that will only contain module methods (as opposed to module instance methods) (are there better names for these?).
Singleton, meaning a class with a single instance. Here that "single instance" is the Module instance.
If I define all methods with .self, then a reader has to inspect all methods to ensure that this module contains no instance methods
The module's documentation should make this clear. If the user of a module has to study the code to understand your module, that is a documentation failure.
What does extend self do?
So I'd like it to be as clear as possible to the reader that these modules contain no instance methods.
extend self does the opposite. It makes all the instance methods also be class methods. It's equivalent to YourModule.extend(YourModule).
module YourModule
def some_method
23
end
extend self
end
Is the same as...
module YourModule
def some_method
23
end
end
YourModule.extend(YourModule)
Which is similar to...
module YourModule
def some_method
23
end
def self.some_method
23
end
end
Why would you do this? To allow both...
YourModule.some_method
and also...
class SomeClass
extend YourModule
end
SomeClass.some_method
There are edge cases where you might want this, but for general use I would argue this is an anti-pattern. The first is using a module as a singleton, the second is using the module as a mixin or trait. These are two rather different design goals for a module. Trying to be both will compromise the design of both.
Pros and Cons.
Since the primary use case of being both a singleton and a mixin is an anti-pattern, I would argue use class << self, with def self.method occasionally, and module_function and extend self never.
class << self
Pros
All class definitions are grouped together.
The block scope makes it clear what affects the class and what affects the instances.
Indentation makes it clear what is in the block.
IDEs can clearly identify what is in the block.
It allows using normal declarations like attr_accessor on the class.
It is documented.
It is common.
Rubocop approved.
Cons
When looking at an individual method, it's not as obvious as def self.method.
def self.method
Pros
It's obvious it's a class method from looking at the method.
It is documented.
It is common.
Rubocop approved.
Cons
You might forget to add the self..
It allows mixing of class and instance methods making the reader hunt through the code.
It does not help using attr_accessor and friends on the class.
extend self
Pros
It allows your module to act as both a singleton (YourModule.method) and a mixin (extend YourModule)... which is also a con.
Cons
It is obscure; many (most?) won't know to look for it or what it means if they find it.
It is not documented (or if it is, I can't find it).
Individual methods look like instance methods.
It can appear anywhere in the module, and there's no consensus where it should go, making it action at a distance.
It affects the meaning of code before it, the one case I can think of this in Ruby, further making it action at a distance.
Rubocop prefers module_function to extend self, though doesn't explain why. For my guesses, see below.
It allows your module to act as both a singleton (YourModule.method) and a mixin (extend YourModule). Those are two rather different use cases making this an anti-pattern.
module_function
I've never heard of this either, but it came up when searching for extend self. I would also say to never use this, use class << self, but it's better than extend self.
Pros
It's at least mentioned in the Modules and Classes documentation.
It's documented.
It works like private in that it affects all methods below it (though this is also a con, see below).
If there are to be no instance methods, it must appear at the top of the module.
Rubocop approved.
Cons
It is obscure; many (most?) won't know to look for it or what it means when they find it.
Individual methods look like instance methods.
It affects the meaning of distant code after it making it action at a distance.

I don't see why it should matter how you decide to define the module methods. Consider simply raising an exception if the module is included in another module (which may be a class). You can do that with the callback (a.k.a. "hook") method Module#included. Here's an example.
module M
# This module is not to be included in a class because
# it contains no instance methods.
def self.included(klass)
raise "\nYou intended to include this module in #{klass}. You must be out of\nyour mind! It does no harm but there is no point in doing so\nbecause this module contains no instance methods. Duh!"
end
def self.hi
puts "Hi, guys"
end
end
M.hi
Hi, guys
class C
include M
end
RuntimeError:
You intended to include this module in C. You must be out of
your mind! It does no harm but there is no point in doing so
because this module contains no instance methods. Duh!

Related

Should mixins make assumptions about their including class?

I found examples of a mixin that makes assumptions about what instance variables an including class has. Something like this:
module Fooable
def calculate
#val_one + #val_two
end
end
class Bar
attr_accessor :val_one, :val_two
include Fooable
end
I found arguments for and against whether it's a good practice. The obvious alternative is passing val_one and val_two as parameters, but that doesn't seem as common, and having more heavily parameterized methods could be a downside.
Is there conventional wisdom regarding a mixin's dependence on class state? What are the advantages/disadvantages of reading values from instance variables vs. passing them in as parameters? Alternatively, does the answer change if you start modifying instance variables instead of just reading them?

It is not a problem at all to assume in a module some properties about the class that includes/prepends it. That is usually done. In fact, the Enumerable module assumes that a class that includes/prepends it has a each method, and has many methods that depend on it. Likewise, the Comparable module assumes that the including/prepending class has <=>. I cannot immediately come up with an example of an instance variable, but there is not a crucial difference between methods and instance variables regarding this point; the same should be said about instance variables.
Disadvantage of passing arguments without using instance variable is that your method call will be verbose and less flexible.

Rule of thumb: Mixins should never make any assumptions about the classes/modules they may be included in. However, as it usually goes, any rule has exceptions.
But first, let's talk about the first part. Specifically, accessing (depending on) including class instance variables. If your mixin depends on anything within the including class, then it means that you can not change that "anything" in the parent class with a guarantee that it would not break something. Also, you will have to document that dependency of mixin not only in documentation related to mixin, but also in the documentation of the class/module that includes the mixin. Because, down the road, the requirements may change or someone might see an opportunity in refactoring your class/module code. Obviously, that person will not dig for that class's documentation or know that that specific class/module has a section in your documentation.
Anyhow, by depending on including class internals, not only your mixin made itself dependant, but also ended up making any class/module that includes it a dependant. Which is definitely not a good thing. Because, you can not control who or which class/module has included your mixin, you will never have a confidence to introduce a change. Not having that confidence to change without a fear of breaking anything is project drainer!
The "workaround" may be - "covering it with test". But, consider yourself or someone else maintaining that code in 2 years. Will you remember to cover your new class, that includes the mixin, to make sure it complies with all mixin dependency requirements? I am sure you or the new maintainer will not.
So, from the maintenance or basic OOP principles, your mixin must not depend on any including class/module.
Now, let's talk about that there is alway an exception to the rule bit.
You can make an exception, provided that mixin dependency does not introduce "surprises" to your code. So, it is ok, if the mixin dependencies are well known among your team or they are a convention. Another case could be when the mixin is used internally and you control who uses it (basically, when you are using it within your own project).
The key advantage of OOP in developing maintainable systems was its ability to hide/encapsulate the implementation details. Making your mixin dependant on any class that includes it, is throwing all the years of OOP experience out the window.

I'd say a mixin should not make assumptions about a specific class it is included in, but it's totally fine to make assumptions about a common parent class (respectively its public methods).
Good example: It's fine to call params in a mixin that will be included in controllers.
Or, to be more precise according your example, I'd think something like this would totally be fine:
class Calculation
attr_accesor :operands
end
module SumOperation
def sum
self.operands.sum
end
end
class MyCustomCalculation < Calculation
include SumOperation
end

You should not hesitate, even for a second, to include instance variables in your mixin module when the situation calls for it.
Suppose, for example, you wrote:
class A
def initialize(h)
#h = h
end
def confirm_colour(colour)
#h[:colour] == colour
end
def confirm_size(size)
#h[:size] == size
end
def confirm_all(colour, size)
confirm_colour(colour) && confirm_size(size)
end
end
a = A.new(:colour=>:blue, :size=>:medium, :weight=>10)
a.confirm_all(:blue, :medium)
#=> true
a.confirm_all(:blue, :large)
#=> false
Now suppose someone asks for the weight be checked as well. We could add the method
def confirm_weight(weight)
#h[:weight] == weight
end
and change confirm_all to
def confirm_all(colour, size)
confirm_colour(colour) && confirm_size(size) && confirm_weight(size)
end
but there is a better way: put all the checks in a module.
module Checks
def confirm_colour(g)
#h[:colour] == g[:colour]
end
def confirm_size(g)
#h[:size] == g[:size]
end
def confirm_weight(g)
#h[:weight] == g[:weight]
end
end
Then include the module in A and run through all the checks.
class A
include Checks
def initialize(h)
#h = h
end
def confirm_all(g)
Checks.instance_methods.all? { |m| send(m, g) }
end
end
a = A.new(:colour=>:blue, :size=>:medium, :weight=>10)
a.confirm_all(:colour=>:blue, :size=>:medium, :weight=>10)
#=> true
a.confirm_all(:colour=>:blue, :size=>:large, :weight=>10)
#=> false
This has the advantage that when checks are to be added or removed, only the module is affected; no changes need to be made to the class. This example is admittedly contrived, but it is a short step to real-world situations.

Although it is common to make these assumptions, you may want to consider a different pattern in order to make more composable and testable code: Dependency Injection.
module Fooable
def add(one, two)
one + two
end
end
class Bar
attr_accessor :val_one, :val_two
include Fooable
def calculate
add #val_one, #val_two
end
end
Although it adds an extra layer of indirection, often times it will be worth it because of the potential to use the concern across a larger number of classes, as well as making it easier to test code.

ruby module_function vs including module

In ruby, I understand that module functions can be made available without mixing in the module by using module_function as shown here. I can see how this is useful so you can use the function without mixing in the module.
module MyModule
def do_something
puts "hello world"
end
module_function :do_something
end
My question is though why you might want to have the function defined both of these ways.
Why not just have
def MyModule.do_something
OR
def do_something
In what kind of cases would it be useful to have the function available to be mixed in, or to be used as a static method?

Think of Enumerable.
This is the perfect example of when you need to include it in a module. If your class defines #each, you get a lot of goodness just by including a module (#map, #select, etc.). This is the only case when I use modules as mixins - when the module provides functionality in terms of a few methods, defined in the class you include the module it. I can argue that this should be the only case in general.
As for defining "static" methods, a better approach would be:
module MyModule
def self.do_something
end
end
You don't really need to call #module_function. I think it is just weird legacy stuff.
You can even do this:
module MyModule
extend self
def do_something
end
end
...but it won't work well if you also want to include the module somewhere. I suggest avoiding it until you learn the subtleties of the Ruby metaprogramming.
Finally, if you just do:
def do_something
end
...it will not end up as a global function, but as a private method on Object (there are no functions in Ruby, just methods). There are two downsides. First, you don't have namespacing - if you define another function with the same name, it's the one that gets evaluated later that you get. Second, if you have functionality implemented in terms of #method_missing, having a private method in Object will shadow it. And finally, monkey patching Object is just evil business :)
EDIT:
module_function can be used in a way similar to private:
module Something
def foo
puts 'foo'
end
module_function
def bar
puts 'bar'
end
end
That way, you can call Something.bar, but not not Something.foo. If you define any other methods after this call to module_function, they would also be available without mixing in.
I don't like it for two reasons, though. First, modules that are both mixed in and have "static" methods sound a bit dodgy. There might be valid cases, but it won't be that often. As I said, I prefer either to use a module as a namespace or mix it in, but not both.
Second, in this example, bar would also be available to classes/modules that mix in Something. I'm not sure when this is desirable, since either the method uses self and it has to be mixed in, or doesn't and then it does not need to be mixed in.
I think using module_function without passing the name of the method is used quite more often than with. Same goes for private and protected.

It's a good way for a Ruby library to offer functionality that does not use (much) internal state. So if you (e.g.) want to offer a sin function and don't want to pollute the "global" (Object) namespace, you can define it as class method under a constant (Math).
However, an app developer, who wants to write a mathematical application, might need sin every two lines. If the method is also an instance method, she can just include the Math (or My::Awesome::Nested::Library) module and can now directly call sin (stdlib example).
It's really about making a library more comfortable for its users. They can choose themself, if they want the functionality of your library on the top level.
By the way, you can achieve a similar functionality like module_function by using: extend self (in the first line of the module). To my mind, it looks better and makes things a bit clearer to understand.
Update: More background info in this blog article.

If you want to look at a working example, check out the chronic gem:
https://github.com/mojombo/chronic/blob/master/lib/chronic/handlers.rb
and Handlers is being included in the Parser class here:
https://github.com/mojombo/chronic/blob/master/lib/chronic/parser.rb
He's using module_function to send the methods from Handlers to specific instances of Handler using that instance's invoke method.

Using Class vs Module for packaging code in Ruby

Let's say I have a bunch of related functions that have no persistent state, say various operations in a string differencing package. I can either define them in a class or module (using self) and they can be accessed the exact same way:
class Diff
def self.diff ...
def self.patch ...
end
or
module Diff
def self.diff ...
def self.patch ...
end
I can then do Diff.patch(...). Which is 'better' (or 'correct')?
The main reason I need to group them up is namespace issues, common function names are all used elsewhere.
Edit: Changed example from matrix to diff. Matrix is a terrible example as it does have state and everyone started explaining why it's better to write them as methods rather than answer the actual question. :(

In your two examples, you are not actually defining methods in a Class or a Module; you are defining singleton methods on an object which happens to be a Class or a Module, but could be just about any object. Here's an example with a String:
Diff = "Use me to access really cool methods"
def Diff.patch
# ...
end
You can do any of these and that will work, but the best way to group related methods is in a Module as normal instance methods (i.e. without self.):
module Diff
extend self # This makes the instance methods available to the Diff module itself
def diff ... # no self.
def patch ...
end
Now you can:
use this functionality from within any Class (with include Diff) or from any object (with extend Diff)
an example of this use is the extend self line which makes it possible to call Diff.patch.
even use these methods in the global namespace
For example, in irb:
class Foo
include Diff
end
Foo.new.patch # => calls the patch method
Diff.patch # => also calls Diff.patch
include Diff # => now you can call methods directly:
patch # => also calls the patch method
Note: the extend self will "modify" the Diff module object itself but it won't have any effect on inclusions of the module. Same thing happens for a def self.foo, the foo won't be available to any class including it. In short, only instance methods of Diff are imported with an include (or an extend), not the singleton methods. Only subclassing a class will provide inheritance of both instance and singleton methods.
When you actually want the inclusion of a module to provide both instance methods and singleton methods, it's not completely easy. You have to use the self.included hook:
module Foo
def some_instance_method; end
module ClassMethods
def some_singleton_method; end
end
def self.included(base)
base.send :extend, ClassMethods
end
def self.will_not_be_included_in_any_way; end
end
class Bar
include Foo
end
# Bar has now instance methods:
Bar.new.some_instance_method # => nil
# and singleton methods:
Bar.some_singleton_method # => nil

The main difference between modules and classes is that you can not instantiate a module; you can't do obj = MyModule.new. The assumption of your question is that you don't want to instantiate anything, so I recommend just using a module.
Still you should reconsider your approach: rather than using arrays of arrays or whatever you are doing to represent a Matrix, it would be more elegant to make your own class to represent a matrix, or find a good class that someone else has already written.

Ruby Modules are used to specify behaviour, pieces of related functionality.
Ruby Classes are used to specify both state and behaviour, a singular entity.
There is a maxim in software design that says that code is a liability, so use the less code possible. In the case of Ruby, the difference in code lines is cero. So you can use either way (if you don't need to save state)
If you want to be a purist, then use a Module, since you won't be using the State functionality. But I wouldn't say that using a class is wrong.
As a trivia info: In Ruby a Class is a kind of Module.
http://www.ruby-doc.org/core-1.9.3/Class.html

The following also works
Matrix = Object.new
def Matrix.add ...
def Matrix.equals ...
That's because so-called "class methods" are just methods added to a single object, and it doesn't really matter what that object class is.

As a matter of form, the Module is more correct. You can still create instances of the class, even if it has only class methods. You can think of a module here as a static class of C# or Java. Classes also always have the instance related methods (new, allocate, etc.). Use the Module. Class methods usually have something to do with objects (creating them, manipulating them).

When to use a module, and when to use a class

I am currently working through the Gregory Brown Ruby Best Practices book. Early on, he is talking about refactoring some functionality from helper methods on a related class, to some methods on module, then had the module extend self.
Hadn't seen that before, after a quick google, found out that extend self on a module lets methods defined on the module see each other, which makes sense.
Now, my question is when would you do something like this
module StyleParser
extend self
def process(text)
...
end
def style_tag?(text)
...
end
end
and then refer to it in tests with
#parser = Prawn::Document::Text::StyleParser
as opposed to something like this?
class StyleParser
def self.process(text)
...
end
def self.style_tag?(text)
...
end
end
is it so that you can use it as a mixin? or are there other reasons I'm not seeing?

A class should be used for functionality that will require instantiation or that needs to keep track of state. A module can be used either as a way to mix functionality into multiple classes, or as a way to provide one-off features that don't need to be instantiated or to keep track of state. A class method could also be used for the latter.
With that in mind, I think the distinction lies in whether or not you really need a class. A class method seems more appropriate when you have an existing class that needs some singleton functionality. If what you're making consists only of singleton methods, it makes more sense to implement it as a module and access it through the module directly.

In this particular case I would probably user neither a class nor a module.
A class is a factory for objects (note the plural). If you don't want to create multiple instances of the class, there is no need for it to exist.
A module is a container for methods, shared among multiple objects. If you don't mix in the module into multiple objects, there is no need for it to exist.
In this case, it looks like you just want an object. So use one:
def (StyleParser = Object.new).process(text)
...
end
def StyleParser.style_tag?(text)
...
end
Or alternatively:
class << (StyleParser = Object.new)
def process(text)
...
end
def style_tag?(text)
...
end
end
But as #Azeem already wrote: for a proper decision, you need more context. I am not familiar enough with the internals of Prawn to know why Gregory made that particular decision.

If it's something you want to instantiate, use a class. The rest of your question needs more context to make sense.

ruby inheritance vs mixins

In Ruby, since you can include multiple mixins but only extend one class, it seems like mixins would be preferred over inheritance.
My question: if you're writing code which must be extended/included to be useful, why would you ever make it a class? Or put another way, why wouldn't you always make it a module?
I can only think of one reason why you'd want a class, and that is if you need to instantiate the class. In the case of ActiveRecord::Base, however, you never instantiate it directly. So shouldn't it have been a module instead?

I just read about this topic in The Well-Grounded Rubyist (great book, by the way). The author does a better job of explaining than I would so I'll quote him:
No single rule or formula always results in the right design. But it’s useful to keep a
couple of considerations in mind when you’re making class-versus-module decisions:
Modules don’t have instances. It follows that entities or things are generally best
modeled in classes, and characteristics or properties of entities or things are
best encapsulated in modules. Correspondingly, as noted in section 4.1.1, class
names tend to be nouns, whereas module names are often adjectives (Stack
versus Stacklike).
A class can have only one superclass, but it can mix in as many modules as it wants. If
you’re using inheritance, give priority to creating a sensible superclass/subclass
relationship. Don’t use up a class’s one and only superclass relationship to
endow the class with what might turn out to be just one of several sets of characteristics.
Summing up these rules in one example, here is what you should not do:
module Vehicle
...
class SelfPropelling
...
class Truck < SelfPropelling
include Vehicle
...
Rather, you should do this:
module SelfPropelling
...
class Vehicle
include SelfPropelling
...
class Truck < Vehicle
...
The second version models the entities and properties much more neatly. Truck
descends from Vehicle (which makes sense), whereas SelfPropelling is a characteristic of vehicles (at least, all those we care about in this model of the world)—a characteristic that is passed on to trucks by virtue of Truck being a descendant, or specialized
form, of Vehicle.

I think mixins are a great idea, but there's another problem here that nobody has mentioned: namespace collisions. Consider:
module A
HELLO = "hi"
def sayhi
puts HELLO
end
end
module B
HELLO = "you stink"
def sayhi
puts HELLO
end
end
class C
include A
include B
end
c = C.new
c.sayhi
Which one wins? In Ruby, it turns out the be the latter, module B, because you included it after module A. Now, it's easy to avoid this problem: make sure all of module A and module B's constants and methods are in unlikely namespaces. The problem is that the compiler doesn't warn you at all when collisions happen.
I argue that this behavior does not scale to large teams of programmers-- you shouldn't assume that the person implementing class C knows about every name in scope. Ruby will even let you override a constant or method of a different type. I'm not sure that could ever be considered correct behavior.

My take: Modules are for sharing behavior, while classes are for modeling relationships between objects. You technically could just make everything an instance of Object and mix in whatever modules you want to get the desired set of behaviors, but that would be a poor, haphazard and rather unreadable design.

The answer to your question is largely contextual. Distilling pubb's observation, the choice is primarily driven by the domain under consideration.
And yes, ActiveRecord should have been included rather than extended by a subclass. Another ORM - datamapper - precisely achieves that!

I like Andy Gaskell's answer very much - just wanted to add that yes, ActiveRecord should not use inheritance, but rather include a module to add the behavior (mostly persistence) to a model/class. ActiveRecord is simply using the wrong paradigm.
For the same reason, I very much like MongoId over MongoMapper, because it leaves the developer the chance to use inheritance as a way of modelling something meaningful in the problem domain.
It's sad that pretty much nobody in the Rails community is using "Ruby inheritance" the way it's supposed to be used - to define class hierarchies, not just to add behavior.

The best way I understand mixins are as virtual classes. Mixins are "virtual classes" that have been injected in a class's or module's ancestor chain.
When we use "include" and pass it a module, it adds the module to the ancestor chain right before the class that we are inheriting from:
class Parent
end
module M
end
class Child < Parent
include M
end
Child.ancestors
=> [Child, M, Parent, Object ...
Every object in Ruby also has a singleton class. Methods added to this singleton class can be directly called on the object and so they act as "class" methods. When we use "extend" on an object and pass the object a module, we are adding the methods of the module to the singleton class of the object:
module M
def m
puts 'm'
end
end
class Test
end
Test.extend M
Test.m
We can access the singleton class with the singleton_class method:
Test.singleton_class.ancestors
=> [#<Class:Test>, M, #<Class:Object>, ...
Ruby provides some hooks for modules when they are being mixed into classes/modules. included is a hook method provided by Ruby which gets called whenever you include a module in some module or class. Just like included, there is an associated extended hook for extend. It will be called when a module is extended by another module or class.
module M
def self.included(target)
puts "included into #{target}"
end
def self.extended(target)
puts "extended into #{target}"
end
end
class MyClass
include M
end
class MyClass2
extend M
end
This creates an interesting pattern that developers could use:
module M
def self.included(target)
target.send(:include, InstanceMethods)
target.extend ClassMethods
target.class_eval do
a_class_method
end
end
module InstanceMethods
def an_instance_method
end
end
module ClassMethods
def a_class_method
puts "a_class_method called"
end
end
end
class MyClass
include M
# a_class_method called
end
As you can see, this single module is adding instance methods, "class" methods, and acting directly on the target class (calling a_class_method() in this case).
ActiveSupport::Concern encapsulates this pattern. Here's the same module rewritten to use ActiveSupport::Concern:
module M
extend ActiveSupport::Concern
included do
a_class_method
end
def an_instance_method
end
module ClassMethods
def a_class_method
puts "a_class_method called"
end
end
end

Right now, I'm thinking about the template design pattern. It just wouldn't feel right with a module.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using `extend self` and `class << self` in Ruby Modules - ruby

Related

Should mixins make assumptions about their including class?

ruby module_function vs including module

Using Class vs Module for packaging code in Ruby

When to use a module, and when to use a class

ruby inheritance vs mixins

Categories

Resources