When to use a module and when to use a global scope - ruby

This is a simple question: Should I have a module that contains all my classes (and submodules):
module ProjectName
class Something
# code
end
module Abc
# code
end
end
Or simply everything in a global scope:
class Something
# code
end
module Abc
# code
end

It is considered good practice not to pollute your global scope. Namespacing your application into modules, encapsulating related behaviour makes it easier to comprehend, helps avoid naming conflicts, and lets you easily port parts of your code into other applications or contexts.
In Ruby it also gives you a natural way of storing module wide constants, and gives you the option to add methods that don't need a containing object directly to the module.
In some languages, (notably JavaScript) scoping also has an impact on performance, as keeping objects in the global scope might prevent them from getting qualified for garbage collection.

Related

How can I determine what objects a call to ruby require added to the global namespace?

Suppose I have a file example.rb like so:
# example.rb
class Example
def foo
5
end
end
that I load with require or require_relative. If I didn't know that example.rb defined Example, is there a list (other than ObjectSpace) that I could inspect to find any objects that had been defined? I've tried checking global_variables but that doesn't seem to work.
Thanks!
Although Ruby offers a lot of reflection methods, it doesn't really give you a top-level view that can identify what, if anything, has changed. It's only if you have a specific target you can dig deeper.
For example:
def tree(root, seen = { })
seen[root] = true
root.constants.map do |name|
root.const_get(name)
end.reject do |object|
seen[object] or !object.is_a?(Module)
end.map do |object|
seen[object] = true
puts object
[ object.to_s, tree(object, seen) ]
end.to_h
end
p tree(Object)
Now if anything changes in that tree structure you have new things. Writing a diff method for this is possible using seen as a trigger.
The problem is that evaluating Ruby code may not necessarily create all the classes that it will or could create. Ruby allows extensive modification to any and all classes, and it's common that at run-time it will create more, or replace and remove others. Only libraries that forcibly declare all of their modules and classes up front will work with this technique, and I'd argue that's a small portion of them.
It depends on what you mean by "the global namespace". Ruby doesn't really have a "global" namespace (except for global variables). It has a sort-of "root" namespace, namely the Object class. (Although note that Object may have a superclass and mixes in Kernel, and stuff can be inherited from there.)
"Global" constants are just constants of Object. "Global functions" are just private instance methods of Object.
So, you can get reasonably close by examining global_variables, Object.constants, and Object.instance_methods before and after the call to require/require_relative.
Note, however, that, depending on your definition of "global namespace" (private) singleton methods of main might also count, so you check for those as well.
Of course, any of the methods the script added could, when called at a later time, themselves add additional things to the global scope. For example, the following script adds nothing to the scope, but calling the method will:
class String
module MyNonGlobalModule
def self.my_non_global_method
Object.const_set(:MY_GLOBAL_CONSTANT, 'Haha, gotcha!')
end
end
end
Strictly speaking, however, you asked about adding "objects" to the global namespace, and neither constants nor methods nor variables are objects, soooooo … the answer is always "none"?

ruby module_function vs including module

In ruby, I understand that module functions can be made available without mixing in the module by using module_function as shown here. I can see how this is useful so you can use the function without mixing in the module.
module MyModule
def do_something
puts "hello world"
end
module_function :do_something
end
My question is though why you might want to have the function defined both of these ways.
Why not just have
def MyModule.do_something
OR
def do_something
In what kind of cases would it be useful to have the function available to be mixed in, or to be used as a static method?
Think of Enumerable.
This is the perfect example of when you need to include it in a module. If your class defines #each, you get a lot of goodness just by including a module (#map, #select, etc.). This is the only case when I use modules as mixins - when the module provides functionality in terms of a few methods, defined in the class you include the module it. I can argue that this should be the only case in general.
As for defining "static" methods, a better approach would be:
module MyModule
def self.do_something
end
end
You don't really need to call #module_function. I think it is just weird legacy stuff.
You can even do this:
module MyModule
extend self
def do_something
end
end
...but it won't work well if you also want to include the module somewhere. I suggest avoiding it until you learn the subtleties of the Ruby metaprogramming.
Finally, if you just do:
def do_something
end
...it will not end up as a global function, but as a private method on Object (there are no functions in Ruby, just methods). There are two downsides. First, you don't have namespacing - if you define another function with the same name, it's the one that gets evaluated later that you get. Second, if you have functionality implemented in terms of #method_missing, having a private method in Object will shadow it. And finally, monkey patching Object is just evil business :)
EDIT:
module_function can be used in a way similar to private:
module Something
def foo
puts 'foo'
end
module_function
def bar
puts 'bar'
end
end
That way, you can call Something.bar, but not not Something.foo. If you define any other methods after this call to module_function, they would also be available without mixing in.
I don't like it for two reasons, though. First, modules that are both mixed in and have "static" methods sound a bit dodgy. There might be valid cases, but it won't be that often. As I said, I prefer either to use a module as a namespace or mix it in, but not both.
Second, in this example, bar would also be available to classes/modules that mix in Something. I'm not sure when this is desirable, since either the method uses self and it has to be mixed in, or doesn't and then it does not need to be mixed in.
I think using module_function without passing the name of the method is used quite more often than with. Same goes for private and protected.
It's a good way for a Ruby library to offer functionality that does not use (much) internal state. So if you (e.g.) want to offer a sin function and don't want to pollute the "global" (Object) namespace, you can define it as class method under a constant (Math).
However, an app developer, who wants to write a mathematical application, might need sin every two lines. If the method is also an instance method, she can just include the Math (or My::Awesome::Nested::Library) module and can now directly call sin (stdlib example).
It's really about making a library more comfortable for its users. They can choose themself, if they want the functionality of your library on the top level.
By the way, you can achieve a similar functionality like module_function by using: extend self (in the first line of the module). To my mind, it looks better and makes things a bit clearer to understand.
Update: More background info in this blog article.
If you want to look at a working example, check out the chronic gem:
https://github.com/mojombo/chronic/blob/master/lib/chronic/handlers.rb
and Handlers is being included in the Parser class here:
https://github.com/mojombo/chronic/blob/master/lib/chronic/parser.rb
He's using module_function to send the methods from Handlers to specific instances of Handler using that instance's invoke method.

Applying how ActiveRecord uses modules to other Ruby projects

I decided to dig into the ActiveRecord code for Rails to try and figure out how some of it works, and was surprised to see it comprised of many modules, which all seem to get included into ActiveRecord::Base.
I know Ruby Modules provide a means of grouping related and reusable methods that can be mixed into other classes to extend their functionality.
However, most of the ActiveRecord modules seem highly specific to ActiveRecord. There seems to be references to instance variables in some modules, suggesting the modules are aware of the internals of the overall ActiveRecord class and other modules.
This got me wondering about how ActiveRecord is designed and how this logic could or should be applied to other Ruby applications.
It is a common 'design pattern' to split large classes into modules that are not really reusable elsewhere simply to split up the class file? Is it seen as good or bad design when modules make use of instance variables that are perhaps defined by a different module or part of the class?
In cases where a class can have many methods and it would become cumbersome to have them all defined in one file, would it make as much sense to simply reopen the class in other files and define more methods in there?
In a command line application I am working on, I have a few classes that do various functions, but I have a top level class that provides an API for the overall application - what I found is that class is becoming bogged down with a lot of methods that really hand off work to other class, and is like the glues that holds the pieces of the application together. I guess I am wondering if it would make sense for me to split out some of the related methods into modules, or re-open the class in different code files? Or is there something else I am not thinking of?
I've created quite a few modules that I didn't intend to be highly reusable. It makes it easier to test a group of related methods in isolation, and classes are more readable if they're only a few hundred lines long rather than thousands. As always, there's a balance to be struck.
I've also created modules that expect the including class to define instance methods so that methods defined on the module can use them. I wouldn't say it's terribly elegant, but it's feasible if you're only sharing code between a couple of classes and you document it well. You could also raise an exception if the class doesn't define the methods you want:
module Aggregator
def aggregate
unless respond_to?(:collection)
raise Exception.new("Classes including #{self} must define #collection")
end
# ...
end
end
I'd be a lot more hesitant to depend on shared instance variables.
The biggest problem I see with re-opening classes is simply managing your source code. Would you end up with multiple copies of aggregator.rb in different directories? Is the load order of those files determinate, and does that affect overriding or calling methods in the class? At least with modules, the include order is explicit in the source.
Update: In a comment, Stephen asked about testing a module that's meant to be included in a class.
RSpec offers shared_examples as a convenient way to test shared behavior. You can define the module's behaviors in a shared context, and then declare that each of the including classes should also exhibit that behavior:
# spec/shared_examples/aggregator_examples.rb
shared_examples_for 'Aggregator' do
describe 'when aggregating records' do
it 'should accumulate values' do
# ...
end
end
end
# spec/models/model_spec.rb
describe Model
it_should_behave_like 'Aggregator'
end
Even if you aren't using RSpec, you can still create a simple stub class that includes your module and then write the tests against instances of that class:
# test/unit/aggregator_test.rb
class AggregatorTester
attr_accessor :collection
def initialize(collection)
self.collection = collection
end
include Aggregator
end
def test_aggregation
assert_equal 6, AggregatorTester.new([1, 2, 3]).aggregate
end

Ruby modules and classes

I am just starting Ruby and learning the concept of modules. I understand that one use of modules is to better organize your code and avoid name clashes. Let's say I have bunch of modules like this (I haven't included the implementation as that's not important)
:
module Dropbox
class Base
def initialize(a_user)
end
end
class Event < Base
def newFile?
end
def newImage?
end
end
class Action < Base
def saveFile(params)
end
end
end
and another module:
module CustomURL
class Base
def initialize(a_user, a_url, a_method, some_args, a_regex)
end
end
class Event < Base
def initialize(a_user, a_url, a_method, some_args, a_regex)
end
def change?
end
end
class Action < Base
def send_request(params)
end
end
end
I am going to have a bunch of these modules (10+, for gmail, hotmail, etc...). What I am trying to figure out is, is this the right way to organize my code?
Basically, I am using module to represent a "service" and all services will have a common interface class (base for initializing, action for list of actions and event for monitoring).
You are defining families of related or dependent classes here. Your usage of modules as namespaces for these families is correct.
Also with this approach it would be easy to build abstract factory for your classes if they had compatible interface. But as far as I see this is not the case for current classes design: for example Dropbox::Event and CustomURL::Event have completely different public methods.
You can reevaluate design of your classes and see if it is possible for them to have uniform interface so that you can use polymorphism and extract something like BaseEvent and BaseAction so that all Events and Actions will derive from these base classes.
Update: As far as you define services, it might be useful to define top-level module like Service and put all your classes inside this module. It will improve modularity of your system. If in the future you would refactor out some base classes for your modules services, you can put them in the top-level namespace. Then your objects will have readable names like these:
Service::Dropbox::Event
Service::Dropbox::Action
Service::CustomURL::Event
Service::CustomURL::Action
Service::BaseEvent
Service::BaseAction
I have some similar code at work, only I'm modeling networking gear.
I took the approach of defining a generic class with the common attributes and methods, including a generic comparator, and then sub-class that for the various models of hardware. The sub-classes contain the unique attributes for that hardware, plus all the support code necessary to initialize or compare an instance of that equipment with another.
As soon as I see the need to write a method similar to another I wrote I think about how I can reuse that code by promoting it to the base-class. Often this involves changing how I am passing parameters, and instead of using formal parameters, I end up using a hash, then pulling what I need from it, keeping the method interface under control.
Because you would have a lot of sub-classes to a base class, it's important to take your time and think out how that base-class should work. As you add sub-classes the task of refactoring the base will get harder because you will have to change other sub-classes. I always find I go down some blind-alleys and have to back up a bit, but as the class matures that should happen less and less.
As you will notice soon, there is no 'right way' of organizing code.
There are subtle differences in readability that are mostly subjective. The way you are organizing classes is just fine for releasing your code as a gem. It usually isn't needed in code that won't be included in other peoples projects, but it won't hurt either.
Just ask yourself "does this make sense for someone reading my code who has no idea what my intention is?".

Loading external files within a class/module

I have an external file: path_to_external_file.rb with some class definition:
class A
some_definitions
end
And I want to load that within module B so that the class A defined above can be referred to as B::A. I tried:
class B
load('path_to_external_file.rb')
end
but A is defined in the main environment, not in B:
A #=> A
B.constants # => []
How can I load external files within some class/module?
Edit
Should I read the external files as strings, and evaluate them within Class.new{...}, and include that class within B?
You cannot. At least using load or require, the Ruby files will always be evaluated in a top context.
You can work around that problem in two ways:
Define class B::A directly (but you are probably trying to avoid that)
Use eval(File.read("path_to_external_file.rb")) within your B class
Edit: Maybe, this library is interesting for you: https://github.com/dreamcat4/script/blob/master/intro.txt
Generally, it's a bad idea to define a class as "class A" but then "magically" make it contained by module B. If you want to refer to class A as B::A, you should define it using either:
module B
class A
# contents
end
end
or:
class B::A
# contents
end
Otherwise anyone who reads your code will be confused. In this case, you don't gain anything in clarity, brevity, or convenience by using "tricks", so straightforward code is better. There is a lesson here: the metaprogramming features of Ruby are great, but there is no need to use them gratuitously. Only use them when you really gain something from doing so. Otherwise you just make your code hard to understand.
BUT, having read your comment, it looks like there is really a good reason to do something like this in your case. I suggest that the following solution would be even better than what you are envisioning:
m = Module.new
m.module_eval("class C; end")
m.constants
=> [:C]
m.const_get(:C)
=> #<Module:0xfd0da0>::C
You see? If you want a "guaranteed unique" namespace, you can use an anonymous module. You could store these modules in a hash or other data structure, and pull the classes out of them as needed. This solves the problem you mentioned, that the users of your app are going to be adding their own classes, and you don't want the names to collide.

Resources