After having read http://www.seejohncode.com/2012/03/16/ruby-class-allocate/ and looking more into the allocate method: http://www.ruby-doc.org/core-1.9.3/Class.html#method-i-allocate I became very curious.
Ruby was built in a way that we did not have to manually allocate or free space for/with objects, but we are given the ability to do so. Why?
What are the uses in Ruby of allocating Objects manually? The article I read showed a custom initialize method, but are the uses of it so limited?
The main reason allocate exists is to allow you to build custom constructors for your objects. As the article you linked mentioned, you can envision the SomeClass.new method as doing something like the following by default:
class SomeClass
def self.new(*a, &b)
obj = allocate
# initialize is a private instance method by default!
obj.send(:initialize, *a, &b)
end
end
Despite what the documentation says, the existence of the allocate method is not so much about memory management as it is about providing some finer grained control over the object creation lifecycle. Most of the time, you won't need this feature, but it is useful for certain edge cases.
For example, in the Newman mail framework, I used this technique to implement a fake constructor for a TestMailer object; it implemented the new method for API compatibility, but actually returned a single instance regardless of how many times it was called:
class Newman::TestMailer
def self.new(settings)
return self.instance if instance
# do some Mail gem configuration stuff here
self.instance = allocate
end
attr_accessor :instance
end
I've not seen many other use cases apart from redefining new as shown above (although I imagine that some weird serialization stuff also uses this feature). But with that in mind, it's worth pointing out that Ruby consistently provides these kinds of extension points, regardless of whether or not you'll need to use them regularly. Robert Klemme has a great article called The Complete Class which I strongly recommend reading if you want to see just how far this design concept has been taken in Ruby :-)
Related
Some open source code I'm integrating in my application has some classes that include code to that effect:
class SomeClass < SomeParentClass
def self.new(options = {})
super().tap { |o|
# do something with `o` according to `options`
}
end
def initialize(options = {})
# initialize some data according to `options`
end
end
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction", and it looks to me like a horrible pattern to use - why split up the object initialization into two parts where one is obviously "The Wrong Think(tm)"?
Ideally, I'd like to see what is inside the super().tap { |o| block, because although this looks like bad practice, just maybe there is some interaction required before or after initialize is called.
Without context, it is possible that you are just looking at something that works but is not considered good practice in Ruby.
However, maybe the approach of separate self.new and initialize methods allows the framework designer to implement a subclass-able part of the framework and still ensure setup required for the framework is completed without slightly awkward documentation that requires a specific use of super(). It would be a slightly easier to document and cleaner-looking API if the end user gets functionality they expect with just the subclass class MyClass < FrameworkClass and without some additional note like:
When you implement the subclass initialize, remember to put super at the start, otherwise the magic won't work
. . . personally I'd find that design questionable, but I think there would at least be a clear motivation.
There might be deeper Ruby language reasons to have code run in a custom self.new block - for instance it may allow constructor to switch or alter the specific object (even returning an object of a different class) before returning it. However, I have very rarely seen such things done in practice, there is nearly always some other way of achieving the goals of such code without customising new.
Examples of custom/different Class.new methods raised in the comments:
Struct.new which can optionally take a class name and return objects of that dynamically created class.
In-table inheritance for ActiveRecord, which allows end user to load an object of unknown class from a table and receive the right object.
The latter one could possibly be avoided with a different ORM design for inheritance (although all such schemes have pros/cons).
The first one (Structs) is core to the language, so has to work like that now (although the designers could have chosen a different method name).
It's impossible to tell why that code is there without seeing the rest of the code.
However, there is something in your question I want to address:
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction"
They do not do the same thing.
Object construction in Ruby is performed in two steps: Class#allocate allocates a new empty object from the object space and sets its internal class pointer to self. Then, you initialize the empty object with some default values. Customarily, this initialization is performed by a method called initialize, but that is just a convention; the method can be called anything you like.
There is an additional helper method called Class#new which does nothing but perform the two steps in sequence, for the programmer's convenience:
class Class
def new(*args, &block)
obj = allocate
obj.send(:initialize, *args, &block)
obj
end
def allocate
obj = __MagicVM__.__allocate_an_empty_object_from_the_object_space__
obj.__set_internal_class_pointer__(self)
obj
end
end
class BasicObject
private def initialize(*) end
end
The constructor new has to be a class method since you start from where there is no instance; you can't be calling that method on a particular instance. On the other hand, an initialization routine initialize is better defined as an instance method because you want to do something specifically with a certain instance. Hence, Ruby is designed to internally call the instance method initialize on a new instance right after its creation by the class method new.
I have a typical OO pattern: one base abstract class (that defines abstract methods) and several classes that implement these abstract methods in class-specific way.
I'm used to write documentation only once in abstract methods and then it automatically propagates to several concrete classes (at least it works the following way in Javadoc, in Scaladoc, in Doxygen), i.e. I don't need to repeat the same description in all concrete classes.
However, I couldn't find how to do such propagation in YARD. I've tried, for example:
# Some description of abstract class.
# #abstract
class AbstractClass
# Some method description.
# #return [Symbol] some return description
# #abstract
def do_something
raise AbstractMethodException.new
end
end
class ConcreteClass < AbstractClass
def do_something
puts "Real implementation here"
return :foo
end
end
What I get:
Code works as expected - i.e. throws AbstractMethodException is called in abstract class, does the job in concrete class
In YARD, AbstractClass is clearly defined as abstract, ConcreteClass is normal
Method description and return type is good in AbstractClass
Method is said to throw AbstractMethodException in AbstractClass
Method has no description at all and generic Object return type in ConcreteClass, there's not a single notice of that an abstract method exists in base class.
What I expect to get:
Method's description and return type are inherited (i.e. copied) to ConcreteClass from info at AbstractClass
Ideally, this method is specified in "inherited" or "implemented" section of ConcreteClass description, with some reference link from ConcreteClass#do_something to AbstractMethod#do_something.
Is it possible to do so?
I think the issue boils down to what you're trying to do. It looks like you're trying to implement an Interface in Ruby, which makes sense if you're coming from Java or .NET, but isn't really how Ruby developers tend to work.
Here is some info about how the typical thought on Interfaces in Ruby: What is java interface equivalent in Ruby?
That said, I understand what you're trying to do. If you don't want your AbstractClass to be implemented directly, but you want to define methods that can be used in a class that behaves like the AbstractClass stipulates (as in Design by Contract), then you probably want to use a Module. Modules work very well for keeping your code DRY, but they don't quite solve your problem related to documenting overridden methods. So, at this point I think you can reconsider how you approach documentation, or at least approach it in a more Ruby-ish way.
Inheritance in Ruby is really (generally speaking from my own experience) only used for a few reasons:
Reusable code and attributes
Default behaviors
Specialization
There are obviously other edge cases, but honestly this is what inheritance tends to be used for in Ruby. That doesn't mean what you're doing won't work or violates some rule, it just isn't typical in Ruby (or most dynamically typed languages). This atypical behavior is probably why YARD (and other Ruby doc generators) doesn't do what you expect. That said, creating an abstract class that only defines the methods that must exist in a subclass really gains you very little from a code perspective. Methods not defined will result in a NoMethodError exception being thrown anyway, and you could programmatically check if an object will respond to a method call (or any message for that matter) from whatever calls the method, using #respond_to?(:some_method) (or other reflective tools for getting meta stuff). It all comes back Ruby's use of Duck Typing.
For pure documentation, why document a method that you don't actually use? You shouldn't really care about the class of the object being sent or received from calling a method, just what those objects respond to. So don't bother creating your AbstractClass in the first place if it adds no real value here. If it contains methods you actually will call directly without overriding, then create a Module, document them there, and run $ yardoc --embed-mixins to include methods (and their descriptions) defined in mixed-in Modules. Otherwise, document methods where you actually implement them, as each implementation should be different (otherwise why re-implement it).
Here is how I would something similar to what you're doing:
# An awesome Module chock-full of reusable code
module Stuff
# A powerful method for doing things with stuff, mostly turning stuff into a Symbol
def do_stuff(thing)
if thing.kind_of?(String)
return thing.to_sym
else
return thing.to_s.to_sym
end
end
end
# Some description of the class
class ConcreteClass
include Stuff
# real (and only implementation)
def do_something
puts "Real implementation here"
return :foo
end
end
an_instance = ConcreteClass.new
an_instance.do_somthing # => :foo
# > Real implementation here
an_instance.do_stuff("bar") # => :bar
Running YARD (with --embed-mixins) will include the methods mixed-in from the Stuff module (along with their descriptions) and you now know that any object including the Stuff module will have the method you expect.
You may also want to look at Ruby Contracts, as it may be closer to what you're looking for to absolutely force methods to accept and return only the types of objects you want, but I'm not sure how that will play with YARD.
Not ideal, but you can still use the (see ParentClass#method) construct (documented here). Not ideal because you have to type this manually for every overriding method.
That being said, I'm no Yard specialist but given its especially customizable architecture, I'd be surprised that there would be no easy way to implement what you need just by extending Yard, somewhere in the Templates department I guess.
This class takes in a hash, and depending on the input, it converts temperatures.
class Temp
def initialize(opt={})
if opt.include?(:cold)
#colddegree=opt[:cold]
end
end
def self.from_cold(cel)
Temp.new(:cold => cel) <= instance of class created in class method
end
end
An instance of a class is created inside a class method. Why is it necessary to do so, and what it does it do, what is the reasoning behind it?
Why would we need to create an instance of a class inside the class instead of the main?
Why would it be used inside a class method? Can there be a time when it would be required inside a regular object methods?
What is it calling and what is happening when it is creating an instance inside a class method? what difference does it make?
Rubyists don't always use the word, but self.from_cold is a factory. This allows you to expose a Temp.from_cold(-40) method signature that programmers consuming your API can understand readily without having to concern themselves with the boilerplate of, say, learning that you have an implicitly required parameter named :cold.
It becomes extra useful when you have a work-performing object that needs to be initialized and then invoked, such as TempConverter.new(cel: -40).to_fahrenheit. Sometimes it's cleaner to expose a TempConverter.cel_to_fahr(-40) option to be consumed by other libraries. It's mostly just a way of hiding complexity inside of this class so that other classes with temp conversion needs don't have to violate the Law of Demeter.
An important thing to understand is that unlike C#, JavaScript, or C++, new is not a keyword in Ruby. It's just a message which objects of class Class understand. The default behaviour is to allocate and initialize a new object of that class, but there's nothing stopping you overriding it, for example:
class Foo
def self.new
puts "OMG i'm initializing an object"
super
end
end
So to answer your last question, it makes no difference where Temp.new is called. In all cases, it sends the message new to the object Temp (which is also a class, but remember that almost everything in Ruby is an object, including classes), which creates and returns a new instance.
I'm not going to attempt to answer your other two questions, because the other answer already does.
Does ruby have something different to other OOP languages (eg: PHP) that makes interfaces useless? Does it have some kind of replacement for this?
Edit:
Some clarifications:
In other languages (eg: PHP), you don't "need" interfaces (they are not mandatory at code level). You use them to make a contract, to improve the architecture of the software. Therefore, the affirmation 'in ruby you don't need interfaces / in other languages you need interfaces because XXX' is false.
No, mixins are not interfaces, they are a complete different thing (PHP 5.4 implements mixins). Have you even used interfaces?
Yes, PHP is OOP. Languages evolve, welcome to the present.
Well, it's a consensus that when an object is passed in Ruby it's not type-checked. Interfaces in Java and PHP are a way to affirm that an object complies to a certain contract or "type" (so something might be Serializable, Authorizable, Sequential and whatever else that you want).
However, in Ruby there is no formalized notion of a contract for which interfaces would fulfill some meaningful role as interface conformance is not checked in method signatures. See, for example, Enumerable. When you mix it into your object you are using its functionality as opposed to declaring that your object is Enumerable. The only benefit of having your object being Enumerable is that having defined each(&blk) you automatically get map, select and friends for free. You can perfectly have an object which implements all of the methods provided by Enumerable but does not mix in the module and it would still work.
For example, for any method in Ruby that expects an IO object you could feed in something that has nothing to do with an IO, and then it would explode with an error or - if you implemented your IO stub correctly - it will work just fine even though your passed object is not declared to be "IO-ish".
The idea behind that comes from the fact that objects in Ruby are not really glorified hash tables with a tag slapped onto them (which then have some extra tags that tell the interpreter or the compiler that this object has interface X therefore it can be used in context Y) but an enclosed entity responding to messages. So if an object responds to a specific message it fullfils the contract, and if it does not respond to that message - well then an error is raised.
So the absence of interfaces is compensated partially by the presence of Modules (which can contain functionality that you reach for without doing any type promises to the caller/consumer) and partially by the tradition of message-passing as opposed to typed dicts.
You should watch some presentations by Jim Weirich since he touches on the subject extensively.
This question is kind of open-ended, but here is my take:
The purpose of an interface declaration is two things:
Declare to your future self or colleagues what methods this class must have
Declare to your computer what methods this class must have
If we take the second purpose first, Ruby source code is never compiled, so there is never an option to verify the conformance to the interface declaration and warn the developer of any failure to conform. This means that if Ruby had some built-in interface support, it wouldn't have an option to verify the conformance until runtime, where the application will crash anyway, because of the missing implementation.
So back to the first purpose. Code readability. This could make sense and a formal Ruby convention of specifying interfaces might be helpful. For now, you would probably communicate this using comments or specs or - as I would probably prefer - a declarative module inclusion. E.g.
module Shippable
# This is an interface module. If your class includes this module, make sure it responds to the following methods
# Returns an integer fixnum representing weight in grams
def weight
raise NotImplementedError.new
end
# Returns an instance of the Dimension class.
def dimensions
raise NotImplementedError.new
end
# Returns true if the entity requires special handling.
def dangerous?
raise NotImplementedError.new
end
# Returns true if the entity is intended for human consumption and thereby must abide by food shipping regulations.
def edible?
raise NotImplementedError.new
end
end
class Product
include Shippable
end
A way of enforcing this interface would be by creating a spec that creates an instance of every class that includes the Shippable module, calls the four methods and expects them to not raise NotImplementedError.
I'm a 'Ruby person', and I would like interfaces, or something like them.
Not to enforce a contract - because enforcing anything isn't very Ruby, and kind of defeats the point of a dynamic language, and anyway there's no "compilation" step to enforce it at - but to document contracts that client subclasses can choose to conform to (or not, although if they choose not to they can't complain if the code doesn't work).
When I'm faced with this problem, ie, when I'm writing a class or module I expect subclasses to provide methods for, I usually document the methods I expect subclasses to provide like this:
module Enumerable
def each
raise NotImplementedError, "Subclasses must provide this method"
end
end
It's not ideal, but it's a reasonably rare case and it works for me.
As ruby is duck-typed, no separate interface is needed, but the objects only need to implement the common methods. Look at the "classic" example below:
class Duck
def move
"I can waddle."
end
end
class Bird
def move
"I can fly."
end
end
animals = []
animals << Duck.new
animals << Bird.new
animals.each do |animal|
puts animal.move
end
In this example, the "interface" is the move method, which is implemented by both the Duck and the Bird class.
I believe it's because Ruby is dynamically typed whereas other languages are statically typed. The only reason you'd need to use an interface in PHP is when you use type hinting when passing objects around.
Ruby is very dynamic and duck-typed. Wouldn't that make interfaces kind of useless or overkill? Interfaces force classes to have certain methods available at compile time.
Review this too:
http://en.wikipedia.org/wiki/Duck_typing
Depends what you mean by interface.
If by interface you mean a concrete object that exists in your language that you inherit from or implement then no you don't use interfaces in a language like ruby.
If you mean interface as in objects have some well documented interface then yes of course, objects still have a well documented interfaces, they have attributes and methods that you expect to be there.
I'd agree that interfaces are something that exists in your mind and the documentation and not in the code as an object.
How do I deserialize in Psych to return an existing object, such as a class object?
To do serialization of a class, I can do
require "psych"
class Class
yaml_tag 'class'
def encode_with coder
coder.represent_scalar 'class', name
end
end
yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n"
but if I try doing Psych.load on that, I get an anonymous class, rather than the String class.
The normal deserialization method is Object#init_with(coder), but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.
Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) has cases where rather than modifying existing objects with init_with, they make sure the right object is created in the first place (for example, calling Complex(o.value) to deserialize a complex number), but I don't think I should be monkeypatching that method.
Am I doomed to working with low level or medium level emitting, or am I missing something?
Background
I'll describe the project, why it needs classes, and why it needs
(de)serialization.
Project
The Small Eigen Collider aims to create random tasks for Ruby to run.
The initial aim was to see if the different implementations of Ruby
(for example, Rubinius and JRuby) returned the same results when given
the same random tasks, but I've found that it's also good for
detecting ways to segfault Rubinius and YARV.
Each task is composed of the following:
receiver.send(method_name, *parameters, &block)
where receiver is a randomly chosen object, and method_name is the
name of a randomly chosen method, and *parameters is an array of
randomly chosen objects. &block is not very random - it's basically
equivalent to {|o| o.inspect}.
For example, if receiver were "a", method_name was :casecmp, and
parameters was ["b"], then you'd be calling
"a".send(:casecmp, "b") {|x| x.inspect}
which is equivalent to (since the block is irrelevant)
"a".casecmp("b")
the Small Eigen Collider runs this code, and logs these inputs and
also the return value. In this example, most implementations of Ruby
return -1, but at one stage, Rubinius returned +1. (I filed this as a
bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius
maintainers fixed the bug)
Why it needs classes
I want to be able to use class objects in my Small Eigen Collider.
Typically, they would be the receiver, but they could also be one of
the parameters.
For example, I found that one way to segfault YARV is to do
Thread.kill(nil)
In this case, receiver is the class object Thread, and parameters is
[nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )
Why it needs (de)serialization
The Small Eigen Collider needs serialization for a couple of reasons.
One is that using a random number generator to generate a series of
random tasks every time isn't practical. JRuby has a different builtin
random number generator, so even when given the same PRNG seed it'd
give different tasks to YARV. Instead, what I do is I create a list of
random tasks once (the first running of ruby
bin/small_eigen_collider), have the initial running serialize the list
of tasks to tasks.yml, and then have subsequent runnings of the
program (using different Ruby implementations) read in that tasks.yml
file to get the list of tasks.
Another reason I need serialization is that I want to be able to edit
the list of tasks. If I have a long list of tasks that leads to a
segmentation fault, I want to reduce the list to the minimum required
to cause a segmentation fault. For example, with the following bug
https://github.com/evanphx/rubinius/issues/643 ,
ObjectSpace.undefine_finalizer(:symbol)
by itself doesn't cause a segmentation fault, and nor does
Symbol.all_symbols.inspect
but if you put the two together, it did. But I started out with
thousands of tasks, and needed to pare it back to just those two
tasks.
Does deserialization returning existing class objects make sense in
this context, or do you think there's a better way?
Status quo of my current researches:
To get your desired behavior working you can use my workaround mentioned above.
Here the nicely formatted code example:
string_yaml = Psych.dump(Marshal.dump(String))
# => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
# => String
Your hack with modifying Class maybe will never work, because real class handling isn't implemented in psych/yaml.
You can take this repo tenderlove/psych, which is the standalone lib.
(Gem: psych - to load it, use: gem 'psych'; require 'psych' and do a check with Psych::VERSION)
As you can see in line 249-251 handling of objects with the anonymous class Class isn't handled.
Instead of monkeypatching the class Class I recommend you to contribute to the Psych lib by extending this class handling.
So in my mind the final yaml result should be something like: "--- !ruby/class String"
After one night thinking about that I can say, this feature would be really nice!
Update
Found a tiny solution which seems to work in the intended way:
code gist: gist.github.com/1012130 (with descriptive comments)
The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!