I am trying to understand the working of ruby open classes. In particular how does a method gets added to an existing class?
There must be some logic applied to identify if the class already exists.
Can someone please explain the concept.
Thanks in advance.
Class names are constants, so it's fairly easy to see if a class name already exists:
Object.constants.grep(/Array/) #=> ["Array"]
So if a class is already defined (Object.const_get('Array').is_a? Class) the methods get added to it, otherwise a new class gets defined. To be a bit more precise this happens on the C side of Ruby, but is more or less what it boils down to.
Underneath Ruby, there are messages.
Suppose you've a class A, and a is an instance of A. There actually are several classes involved:
Basic Object
-> Object
-> A
-> a
When you call method foo on a, what's really happening is you're sending the message "call foo" to a. It checks if the instance method is defined for a (as in, the singleton/eigenclass), then tries instance methods of A, then Object, and finally Basic Object. If all else fails, the VM sends a new message (i.e. "method not found") and the procedure starts again until it gets caught (by default, it raises an exception).
You can add a new method (i.e. a message handler) to any of the classes in the hierarchy, in order to change the behavior of a. If you add an instance method to A or Object, then a will have it. If you add a method to a only (i.e. def a.bar; 'bar'; end), then other instances of A will have it.
Ruby's classes (I'd argue it would more appropriate to say ruby's objects) are open in the sense that you can add these new methods, i.e. message handlers, at any point in time -- including long after they've been defined.
Related
At the office, we had this little brain teaser:
class Bicycle
def spares
{tire_size: 21}.merge(local_spares)
end
def local_spares
{}
end
end
class RoadBike < Bicycle
def local_spares
{tape_color: 'red'}
end
end
roadbike = RoadBike.new
roadbike.spares
Most of us didn't get what roadbike.spares output is unless we ran the whole code in the console. We had our different hunch on the behaviour but can anyone break it down to me what really happened here?
If anyone's wondering, the output is {tire_size: 21, tape_color: 'red'}
It's quite obvious, RoadBike#spares (which is the same as Bicycle#spares, because RoadBike doesn't override this method) calls internally RoadBike#local_spares, merges its returned value to {tire_size: 21} hash and returns the result.
No surprise at all.
This is called method overriding. The RoadBike#local_spares method overrides the Bicycle#local_spares method because RoadBike inherits from Bicycle.
When you send a message to an object, Ruby will try to find a method with the same name as the message to execute. It first looks at the object's class, then at that class's superclass, then at that class's superclass and so on.
When you send a RoadBike object the spares message, it will first try (and fail) to find a method named spares in RoadBike. Then it will look into its superclass (Bicycle) and succeed.
The body of that method contains a message send of local_spares to the receiver object. Again, Ruby tries to find a method named local_spares in the class of the object (still RoadBike) and succeeds, so it executes that method.
This is all just standard inheritance and method overriding. There's nothing really special or surprising or "brain teaserish" about that. In fact, this is pretty much the whole point of inheritance and method overriding: that more specialized objects can provide more specialized implementations than their more general parents.
Note: the method lookup algorithm is in reality a bit more involved than that.
First off, what happens if there is no more superclass, and the method still hasn't been found? In that case, Ruby will send the message method_missing to the receiver and pass along the name of the method that it tried to look up. Only if the method_missing method also can't be found, will Ruby raise a NoMethodError.
And secondly, there are singleton classes, included modules, prepended modules, and Refinements to consider. So, really Ruby will look at the object's singleton class first, before it looks at the class and then the superclass and so on. And "look at the class" actually means look first at the prepended modules (in reverse order), then at the class itself, and then at the included modules (again in reverse order). Oh, and that has to be done recursively as well, so for each prepended module look first at the prepended modules of that module, then at the module itself, then the included modules and so forth.
(Oh, and Refinements throw a another wrinkle in this, obviously.)
Most of the Ruby implementations simplify this algorithm greatly by separating their internal notion of what a "class" is from the programmer's notion, by introducing the concept of "hidden classes" (YARV calls them "virtual classes") that exist inside the implementation but aren't exposed to the programmer. So, for example, the singleton class of an object would be a hidden class, and the class pointer of the object would simply point to the singleton class and the superclass pointer of the singleton class would point to the actual class of the object. When you include a module into a class, the implementation will synthesize a hidden class (which YARV calls an "include class") for the module and insert it as the superclass of the class and make the former superclass the superclass of the hidden class. Methods like Object#class and Class#superclass would then simply follow the superclass chain until it finds the first non-hidden class and return that, instead of returning the class/superclass pointer directly.
This makes methods like Object#class, Class#superclass and Module#ancestors slightly more complex, because they have to skip hidden classes, but it simplifies the method lookup algorithm, which is one of the most important performance bottlenecks in any object-oriented system.
Serialization in Ruby can be done via the built-in Marshal module.
It provides methods for dumping an object and loading it.
I'm writing some serialization and wondering how an object can be loaded and all of its attributes restored, without actually calling the constructor?
For example, suppose I have a class
class Test
def initialize(id)
#id = id
end
end
Suppose I serialized it to (assuming a very simplified scheme that likely doesn't work in general)
{
"Test": {
"id": 3
}
}
When I want to load it back, I figured I'll just instantiate a new Test object and set its attributes. However, calling the new method will throw an exception because I didn't pass in an ID yet. Indeed, I haven't gotten to the point where I have read the ID, and in general, a constructor could take any arbitrary number of arguments, and I don't want to have to write custom logic for each and every class.
When you load the object via Marshal.load, it just somehow works. How does it work?
See this answer for an explanation of what the default Class::new does. You can mimic this behavior without adding the call to initialize. Instead, you would manually set the state of the class via something similar to instance_variable_set. Note, this is just a suggesstion of how you can implement this yourself. The actual Marshal.load is likely written in c, but it would do something similar.
This class takes in a hash, and depending on the input, it converts temperatures.
class Temp
def initialize(opt={})
if opt.include?(:cold)
#colddegree=opt[:cold]
end
end
def self.from_cold(cel)
Temp.new(:cold => cel) <= instance of class created in class method
end
end
An instance of a class is created inside a class method. Why is it necessary to do so, and what it does it do, what is the reasoning behind it?
Why would we need to create an instance of a class inside the class instead of the main?
Why would it be used inside a class method? Can there be a time when it would be required inside a regular object methods?
What is it calling and what is happening when it is creating an instance inside a class method? what difference does it make?
Rubyists don't always use the word, but self.from_cold is a factory. This allows you to expose a Temp.from_cold(-40) method signature that programmers consuming your API can understand readily without having to concern themselves with the boilerplate of, say, learning that you have an implicitly required parameter named :cold.
It becomes extra useful when you have a work-performing object that needs to be initialized and then invoked, such as TempConverter.new(cel: -40).to_fahrenheit. Sometimes it's cleaner to expose a TempConverter.cel_to_fahr(-40) option to be consumed by other libraries. It's mostly just a way of hiding complexity inside of this class so that other classes with temp conversion needs don't have to violate the Law of Demeter.
An important thing to understand is that unlike C#, JavaScript, or C++, new is not a keyword in Ruby. It's just a message which objects of class Class understand. The default behaviour is to allocate and initialize a new object of that class, but there's nothing stopping you overriding it, for example:
class Foo
def self.new
puts "OMG i'm initializing an object"
super
end
end
So to answer your last question, it makes no difference where Temp.new is called. In all cases, it sends the message new to the object Temp (which is also a class, but remember that almost everything in Ruby is an object, including classes), which creates and returns a new instance.
I'm not going to attempt to answer your other two questions, because the other answer already does.
I have a class something like below, and I used instance variables (array) to avoid using lots of method parameters.
It works as I expected but is that a good practice?
Actually I wouldn't expect that worked, but I guess class methods are not working as static methods in other languages.
class DummyClass
def self.dummy_method1
#arr = []
# Play with that array
end
def self.dummy_method2
# use #arr for something else
end
end
The reason instance variables work on classes in Ruby is that Ruby classes are instances themselves (instances of class Class). Try it for yourself by inspecting DummyClass.class. There are no "static methods" in the C# sense in Ruby because every method is defined on (or inherited into) some instance and invoked on some instance. Accordingly, they can access whatever instance variables happen to be available on the callee.
Since DummyClass is an instance, it can have its own instance variables just fine. You can even access those instance variables so long as you have a reference to the class (which should be always because class names are constants). At any point, you would be able to call ::DummyClass.instance_variable_get(:#arr) and get the current value of that instance variable.
As for whether it's a good thing to do, it depends on the methods.
If #arr is logically the "state" of the instance/class DummyClass, then store it in instance variable. If #arr is only being used in dummy_method2 as an operational shortcut, then pass it as an argument. To give an example where the instance variable approach is used, consider ActiveRecord in Rails. It allows you to do this:
u = User.new
u.name = "foobar"
u.save
Here, the name that has been assigned to the user is data that is legitimately on the user. If, before the #save call, one were to ask "what is the name of the user at this point", you would answer "foobar". If you dig far enough into the internals (you'll dig very far and into a lot of metaprogramming, you'll find that they use instance variables for exactly this).
The example I've used contains two separate public invocations. To see a case where instance variables are still used despite only one call being made, look at the ActiveRecord implementation of #update_attributes. The method body is simply load(attributes, false) && save. Why does #save not get passed any arguments (like the new name) even though it is going to be in the body of save where something like UPDATE users SET name='foobar' WHERE id=1;? It's because stuff like the name is information that belongs on the instance.
Conversely, we can look at a case where instance variables would make no sense to use. Look at the implementation of #link_to_if, a method that accepts a boolean-ish argument (usually an expression in the source code) alongside arguments that are ordinarily accepted by #link_to such as the URL to link to. When the boolean condition is truthy, it needs to pass the rest of the arguments to #link_to and invoke it. It wouldn't make much sense to assign instance variables here because you would not say that the invoking context here (the renderer) contains that information in the instance. The renderer itself does not have a "URL to link to", and consequently, it should not be buried in an instance variable.
Those are class instance variables and are a perfectly legitimate things in ruby: classes are objects too (instances of Class) and so have instance variables.
One thing to look out for is that each subclass will have its own set of class instance variables (after all these are different objects): If you subclassed DummyClass, class methods on the subclass would not be able to see #arr.
Class variables (##foo) are of course the other way round: the entire class hierarchy shares the same class variables.
method_missing
*obj.method_missing( symbol h , args i ) → other_obj
Invoked by Ruby when obj is sent a
message it cannot handle. symbol is
the symbol for the method called, and
args are any arguments that were
passed to it. The example below
creates a class Roman, which responds
to methods with names consisting of
roman numerals, returning the
corresponding integer values. A more
typical use of method_missing is to
implement proxies, delegators, and
forwarders.
class Roman
def roman_to_int(str)
# ...
end
def method_missing(method_id)
str = method_id.id2name
roman_to_int(str)
end
end
r = Roman.new
r.iv ! 4
r.xxiii ! 23
r.mm ! 2000
I just heard about method-missing and went to find out more in Programming Ruby but the above explanation quoted from the book is over my head. Does anyone have an easier explanation? More specifically, is method-missing only used by the interpreter or is there ever a need to call it directly in a program (assuming I'm just writing web apps, as opposed to writing code for NASA)?
It's probably best to not think of ruby as having methods. When you call a ruby "method" you are actually sending a message to that instance (or class) and if you have defined a handler for the message, it is used to process and return a value.
So method_missing is a special definition that gets called whenever ruby cannot find an apropriate handler. You could also think of it like a * method.
Ruby doesn't have any type enforcement, and likewise doesn't do any checking as to what methods an object has when the script is first parsed, because this can be dynamically changed as the application runs.
What method_missing does, is let you intercept and handle calls to methods that don't exist for a given object. This provides the under-the-hood power behind pretty much every DSL (domain-specific language) written in Ruby.
In the case of the example, every one of 'r.iv', 'r.mm', and so on is actually a method call to the Roman object. Of course, it doesn't have an 'iv' or an 'mm' method, so instead control is passed to method_missing, which gets the name of the method that was called, as well as whatever arguments were passed.
method_missing then converts the method name from a symbol to a string, and parses it as a Roman number, returning the output as an integer.
It's basically a catch-all for messages that don't match up to any methods. It's used extensively in active record for dynamic finders. It's what lets you write something like this:
SomeModel.find_by_name_and_number(a_name, a_number)
The Model doesn't contain code for that find_by, so method_missing is called which looks at is says - I recognize that format, and carries it out. If it doesn't, then you get a method not found error.
In the Roman example you provide it illustrates how you can extend the functionality of a class without explicitly defining methods.
r.iv is not a method so method_missing catches it and calls roman_to_int on the method id "iv"
It's also useful when you want to handle unrecognized methods elsewhere, like proxies, delegators, and forwarders, as the documentation states.
You do not call "method_missing" (the interpreter calls it). Rather, you define it (override it) for a class which you want to make to be more flexible. As mentioned in other comments, the interpreter will call your version of method_missing when the class (or instance) does not ("explicitly"?) define the requested method. This gives you a chance to make something up, based on the ersatz method/message name.
Have you ever done any "reflection" programming in Java? Using this method would be as if the class to be accessed via reflection could look at the string (excuse me, "symbol") of the method name if a no-such-method exception was thrown, and then make something up as that method's implementation on the fly.
Dynamic programming is kind of a "one-up" on reflection.
Since you mention web apps I'll guess that you are familiar with Ruby on Rails. A good example of how method_missing can be used is the different find_by_<whatever> methods that's available. None of those methods actually exist! They are synthesized during run time. All of this magic happens because of how ruby intercepts invocations of non-existing methods.
You can read more about that and other uses of method_missing here.
ActiveRecord uses method_missing to define find_by methods. But since, method_missing is basically a last resort method, this can be a serious performance bottleneck. What I discovered is that ActiveRecord does some further awesome metaprogramming by defining the new finder method as a class method !! Thus, any further calls to the same finder method would not hit the method_missing because it is now a class method. For details about the actual code snippet from base.rb, click here.