How do I deserialize classes in Psych? - ruby

How do I deserialize in Psych to return an existing object, such as a class object?
To do serialization of a class, I can do
require "psych"
class Class
yaml_tag 'class'
def encode_with coder
coder.represent_scalar 'class', name
end
end
yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n"
but if I try doing Psych.load on that, I get an anonymous class, rather than the String class.
The normal deserialization method is Object#init_with(coder), but that only changes the state of the existing anonymous class, whereas I'm wanting the String class.
Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) has cases where rather than modifying existing objects with init_with, they make sure the right object is created in the first place (for example, calling Complex(o.value) to deserialize a complex number), but I don't think I should be monkeypatching that method.
Am I doomed to working with low level or medium level emitting, or am I missing something?
Background
I'll describe the project, why it needs classes, and why it needs
(de)serialization.
Project
The Small Eigen Collider aims to create random tasks for Ruby to run.
The initial aim was to see if the different implementations of Ruby
(for example, Rubinius and JRuby) returned the same results when given
the same random tasks, but I've found that it's also good for
detecting ways to segfault Rubinius and YARV.
Each task is composed of the following:
receiver.send(method_name, *parameters, &block)
where receiver is a randomly chosen object, and method_name is the
name of a randomly chosen method, and *parameters is an array of
randomly chosen objects. &block is not very random - it's basically
equivalent to {|o| o.inspect}.
For example, if receiver were "a", method_name was :casecmp, and
parameters was ["b"], then you'd be calling
"a".send(:casecmp, "b") {|x| x.inspect}
which is equivalent to (since the block is irrelevant)
"a".casecmp("b")
the Small Eigen Collider runs this code, and logs these inputs and
also the return value. In this example, most implementations of Ruby
return -1, but at one stage, Rubinius returned +1. (I filed this as a
bug https://github.com/evanphx/rubinius/issues/518 and the Rubinius
maintainers fixed the bug)
Why it needs classes
I want to be able to use class objects in my Small Eigen Collider.
Typically, they would be the receiver, but they could also be one of
the parameters.
For example, I found that one way to segfault YARV is to do
Thread.kill(nil)
In this case, receiver is the class object Thread, and parameters is
[nil]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )
Why it needs (de)serialization
The Small Eigen Collider needs serialization for a couple of reasons.
One is that using a random number generator to generate a series of
random tasks every time isn't practical. JRuby has a different builtin
random number generator, so even when given the same PRNG seed it'd
give different tasks to YARV. Instead, what I do is I create a list of
random tasks once (the first running of ruby
bin/small_eigen_collider), have the initial running serialize the list
of tasks to tasks.yml, and then have subsequent runnings of the
program (using different Ruby implementations) read in that tasks.yml
file to get the list of tasks.
Another reason I need serialization is that I want to be able to edit
the list of tasks. If I have a long list of tasks that leads to a
segmentation fault, I want to reduce the list to the minimum required
to cause a segmentation fault. For example, with the following bug
https://github.com/evanphx/rubinius/issues/643 ,
ObjectSpace.undefine_finalizer(:symbol)
by itself doesn't cause a segmentation fault, and nor does
Symbol.all_symbols.inspect
but if you put the two together, it did. But I started out with
thousands of tasks, and needed to pare it back to just those two
tasks.
Does deserialization returning existing class objects make sense in
this context, or do you think there's a better way?

Status quo of my current researches:
To get your desired behavior working you can use my workaround mentioned above.
Here the nicely formatted code example:
string_yaml = Psych.dump(Marshal.dump(String))
# => "--- ! \"\\x04\\bc\\vString\"\n"
string_class = Marshal.load(Psych.load(string_yaml))
# => String
Your hack with modifying Class maybe will never work, because real class handling isn't implemented in psych/yaml.
You can take this repo tenderlove/psych, which is the standalone lib.
(Gem: psych - to load it, use: gem 'psych'; require 'psych' and do a check with Psych::VERSION)
As you can see in line 249-251 handling of objects with the anonymous class Class isn't handled.
Instead of monkeypatching the class Class I recommend you to contribute to the Psych lib by extending this class handling.
So in my mind the final yaml result should be something like: "--- !ruby/class String"
After one night thinking about that I can say, this feature would be really nice!
Update
Found a tiny solution which seems to work in the intended way:
code gist: gist.github.com/1012130 (with descriptive comments)

The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!

Related

Using Module#prepend to interrogate a str_enum

I have a Rails model, which is using the str_enum gem.
I'm building a generator which reads the models and creates pages for them, and so I'd like to be able to understand what str_enums are attached to a model.
For example
class User < ApplicationRecord
str_enum :email_frequency, %i[every daily weekly], default: 'every'
end
Ideally, I'd like to be able to query the User model and understand there is a str_enum attached to email_frequency, with values of every, daily & weekly.
Once I can understand there is a str_enum attached to a given field, I can pluralize the field and get the values:
irb(main):004:0> User.email_frequencies
=> ["every", "daily", "weekly"]
The question has also be asked over here and the suggestion is to use Module#prepend. I'm familiar with prepend to conditionally insert methods into a model.
How can I use it for this problem?
EDIT
This is quite simple with validations, for example: get validations from model
If I understand your question correctly is that you wanna get all column that has attached with enum string. If so you can override the gem method like this
# lib/extenstions/str_enum.rb
module Extensions
module StrEnum
module ClassMethods
def str_enum(column, *args)
self.str_enums << column.to_s.pluralize
super
end
end
def self.prepended(base)
class << base
mattr_accessor :str_enums
self.str_enums = []
prepend ClassMethods
end
end
end
end
In the User model
prepend Extensions::StrEnum
Now you can use
User.str_enums
to list all columns has attached with str enum.
Make sure you have add lib directory into load path.
So for starters, you could, of course, use the approach that Ninh Le has described and monkeypatch your desired behavior into the gem. In fact, I'm fairly confident that it would work, since your use case is currently relatively easy and you really just need to keep track of all the times the str_enum method gets called.
I would, however, encourage you to consider doing one of two things:
If you plan to do more complex stuff with your enums, consider using one of the more heavy-duty enum gems like enumerize, enumerate_it or active_enum. All of these are packages that have been around for a decade (give or take) and still receive support and all of them have been built with a certain degree of extensibility and introspection in mind (albeit with different approaches).
Have a look at the gem and consider building your own little macro on top of it. IMO one of multiple of Andrew Kane's libraries' biggest weaknesses is arguably their kind of hacky/scripty approach which, while making the libraries hard to extend, makes them inherently easy to understand and thus use as a basis for your own stuff (whereas the gems with a better/more elaborate approach are harder to understand and adapt beyond the means the author has intended to).
Either way, you'll be fine with both of my suggestions as well as Ninh Le's.

How to declare what need for include ruby module

Ruby doesn't have interfaces, but how tell other programmers what need to include current module in class, like instance variables, methods, constants, etc?
There's no way of formally defining what's required, it's up to you to document it clearly. The reason for this is Ruby is very dynamic by design so static tests won't work, the problems they detect might be rectified by the time the code is actually executed. Likewise, something that might seem correct could be broken later on by some other code.
C++, Java, and even Objective-C and Swift can do compile-time checking to enforce these things. Once a class is defined it cannot be undefined. Once a method is created it can't be removed. This is not the case in a Smalltalk-derived language like Ruby.
Ruby has no ability to test these things up-front since what the program actually does can change radically from the time the code is loaded and parsed, and when it's actually executed.
If you have a particularly complicated footprint you might want to write a method for testing it that can be exercised to verify that everything's working correctly. That can be called by the programmer whenever they think they're ready.
The only way to verify that your Ruby code is running correctly is to run it. No amount of static analysis will ever come close to that.
There are some existing mixins, classes, and methods in the Ruby core library that have the exact same problem, e.g. Enumerable, Comparable, Range, Hash, Array#uniq: they require certain behavior from other objects in order to work. Some examples are:
Enumerable:
The class must provide a method each, which yields successive members of the collection. If Enumerable#max, #min, or #sort is used, the objects in the collection must also implement a meaningful <=> operator […]
Comparable:
The class must define the <=> operator, which compares the receiver against another object, returning -1, 0, or +1 depending on whether the receiver is less than, equal to, or greater than the other object. If the other object is not comparable then the <=> operator should return nil.
Range:
Ranges can be constructed using any objects that can be compared using the <=> operator. Methods that treat the range as a sequence (#each and methods inherited from Enumerable) expect the begin object to implement a succ method to return the next object in sequence. The step and include? methods require the begin object to implement succ or to be numeric.
Hash:
A user-defined class may be used as a hash key if the hash and eql? methods are overridden to provide meaningful behavior.
And in order to define what "meaningful behavior" means, the documentation of Hash further links to the documentation of Object#hash and Object#eql?:
Object#hash:
[…] This function must have the property that a.eql?(b) implies a.hash == b.hash. […]
Object#eql?:
[…] The eql? method returns true if obj and other refer to the same hash key. […]
So, as you can see, your question is a quite common one, and the answer is: documentation.
This is a subjective question, so with that in mind here is my opinion.
Maybe you could create a base class that your objects inherit from.
for example:
class BaseA
def say(msg)
raise NotImplementedError
end
end
class A < BaseA
def say(msg)
puts "saying #{msg}"
end
end
Even though this isn't a real interface you can "pretend" it's one and have all the classes that need BaseA's methods and then override them with the real behavior. Then I suppose developers could just look at the base class to see what methods need to implemented.

Class#allocate and its uses

After having read http://www.seejohncode.com/2012/03/16/ruby-class-allocate/ and looking more into the allocate method: http://www.ruby-doc.org/core-1.9.3/Class.html#method-i-allocate I became very curious.
Ruby was built in a way that we did not have to manually allocate or free space for/with objects, but we are given the ability to do so. Why?
What are the uses in Ruby of allocating Objects manually? The article I read showed a custom initialize method, but are the uses of it so limited?
The main reason allocate exists is to allow you to build custom constructors for your objects. As the article you linked mentioned, you can envision the SomeClass.new method as doing something like the following by default:
class SomeClass
def self.new(*a, &b)
obj = allocate
# initialize is a private instance method by default!
obj.send(:initialize, *a, &b)
end
end
Despite what the documentation says, the existence of the allocate method is not so much about memory management as it is about providing some finer grained control over the object creation lifecycle. Most of the time, you won't need this feature, but it is useful for certain edge cases.
For example, in the Newman mail framework, I used this technique to implement a fake constructor for a TestMailer object; it implemented the new method for API compatibility, but actually returned a single instance regardless of how many times it was called:
class Newman::TestMailer
def self.new(settings)
return self.instance if instance
# do some Mail gem configuration stuff here
self.instance = allocate
end
attr_accessor :instance
end
I've not seen many other use cases apart from redefining new as shown above (although I imagine that some weird serialization stuff also uses this feature). But with that in mind, it's worth pointing out that Ruby consistently provides these kinds of extension points, regardless of whether or not you'll need to use them regularly. Robert Klemme has a great article called The Complete Class which I strongly recommend reading if you want to see just how far this design concept has been taken in Ruby :-)

Why do Ruby people say they don't need interfaces?

Does ruby have something different to other OOP languages (eg: PHP) that makes interfaces useless? Does it have some kind of replacement for this?
Edit:
Some clarifications:
In other languages (eg: PHP), you don't "need" interfaces (they are not mandatory at code level). You use them to make a contract, to improve the architecture of the software. Therefore, the affirmation 'in ruby you don't need interfaces / in other languages you need interfaces because XXX' is false.
No, mixins are not interfaces, they are a complete different thing (PHP 5.4 implements mixins). Have you even used interfaces?
Yes, PHP is OOP. Languages evolve, welcome to the present.
Well, it's a consensus that when an object is passed in Ruby it's not type-checked. Interfaces in Java and PHP are a way to affirm that an object complies to a certain contract or "type" (so something might be Serializable, Authorizable, Sequential and whatever else that you want).
However, in Ruby there is no formalized notion of a contract for which interfaces would fulfill some meaningful role as interface conformance is not checked in method signatures. See, for example, Enumerable. When you mix it into your object you are using its functionality as opposed to declaring that your object is Enumerable. The only benefit of having your object being Enumerable is that having defined each(&blk) you automatically get map, select and friends for free. You can perfectly have an object which implements all of the methods provided by Enumerable but does not mix in the module and it would still work.
For example, for any method in Ruby that expects an IO object you could feed in something that has nothing to do with an IO, and then it would explode with an error or - if you implemented your IO stub correctly - it will work just fine even though your passed object is not declared to be "IO-ish".
The idea behind that comes from the fact that objects in Ruby are not really glorified hash tables with a tag slapped onto them (which then have some extra tags that tell the interpreter or the compiler that this object has interface X therefore it can be used in context Y) but an enclosed entity responding to messages. So if an object responds to a specific message it fullfils the contract, and if it does not respond to that message - well then an error is raised.
So the absence of interfaces is compensated partially by the presence of Modules (which can contain functionality that you reach for without doing any type promises to the caller/consumer) and partially by the tradition of message-passing as opposed to typed dicts.
You should watch some presentations by Jim Weirich since he touches on the subject extensively.
This question is kind of open-ended, but here is my take:
The purpose of an interface declaration is two things:
Declare to your future self or colleagues what methods this class must have
Declare to your computer what methods this class must have
If we take the second purpose first, Ruby source code is never compiled, so there is never an option to verify the conformance to the interface declaration and warn the developer of any failure to conform. This means that if Ruby had some built-in interface support, it wouldn't have an option to verify the conformance until runtime, where the application will crash anyway, because of the missing implementation.
So back to the first purpose. Code readability. This could make sense and a formal Ruby convention of specifying interfaces might be helpful. For now, you would probably communicate this using comments or specs or - as I would probably prefer - a declarative module inclusion. E.g.
module Shippable
# This is an interface module. If your class includes this module, make sure it responds to the following methods
# Returns an integer fixnum representing weight in grams
def weight
raise NotImplementedError.new
end
# Returns an instance of the Dimension class.
def dimensions
raise NotImplementedError.new
end
# Returns true if the entity requires special handling.
def dangerous?
raise NotImplementedError.new
end
# Returns true if the entity is intended for human consumption and thereby must abide by food shipping regulations.
def edible?
raise NotImplementedError.new
end
end
class Product
include Shippable
end
A way of enforcing this interface would be by creating a spec that creates an instance of every class that includes the Shippable module, calls the four methods and expects them to not raise NotImplementedError.
I'm a 'Ruby person', and I would like interfaces, or something like them.
Not to enforce a contract - because enforcing anything isn't very Ruby, and kind of defeats the point of a dynamic language, and anyway there's no "compilation" step to enforce it at - but to document contracts that client subclasses can choose to conform to (or not, although if they choose not to they can't complain if the code doesn't work).
When I'm faced with this problem, ie, when I'm writing a class or module I expect subclasses to provide methods for, I usually document the methods I expect subclasses to provide like this:
module Enumerable
def each
raise NotImplementedError, "Subclasses must provide this method"
end
end
It's not ideal, but it's a reasonably rare case and it works for me.
As ruby is duck-typed, no separate interface is needed, but the objects only need to implement the common methods. Look at the "classic" example below:
class Duck
def move
"I can waddle."
end
end
class Bird
def move
"I can fly."
end
end
animals = []
animals << Duck.new
animals << Bird.new
animals.each do |animal|
puts animal.move
end
In this example, the "interface" is the move method, which is implemented by both the Duck and the Bird class.
I believe it's because Ruby is dynamically typed whereas other languages are statically typed. The only reason you'd need to use an interface in PHP is when you use type hinting when passing objects around.
Ruby is very dynamic and duck-typed. Wouldn't that make interfaces kind of useless or overkill? Interfaces force classes to have certain methods available at compile time.
Review this too:
http://en.wikipedia.org/wiki/Duck_typing
Depends what you mean by interface.
If by interface you mean a concrete object that exists in your language that you inherit from or implement then no you don't use interfaces in a language like ruby.
If you mean interface as in objects have some well documented interface then yes of course, objects still have a well documented interfaces, they have attributes and methods that you expect to be there.
I'd agree that interfaces are something that exists in your mind and the documentation and not in the code as an object.

What are some good examples of Mixins and or Traits?

I was reading up on Ruby, and learned about its mixins pattern, but couldn't think of many useful mixin functionality (because I'm not used to thinking that way most likely). So I was wondering what would be good examples of useful Mixin functionality?
Thanks
Edit: A bit of background. I'm Coming from C++, and other Object languages, but my doubt here is that Ruby says it's not inheriting mixins, but I keep seeing mixins as Multiple inheritance, so I fear I'm trying to categorize them too soon into my comfort zone, and not really grok what a mixin is.
They are usually used to add some form of standard functionality to a class, without having to redefine it all. You can probably think of them a bit like interfaces in Java, but instead of just defining a list of methods that need to be implemented, many of them will actually be implemented by including the module.
There are a few examples in the standard library:
Singleton - A module that can be mixed into any class to make it a singleton. The initialize method is made private, and an instance method added, which ensures that there is only ever one instance of that class in your application.
Comparable - If you include this module in a class, defining the <=> method, which compares the current instance with another object and says which is greater, is enough to provide <, <=, ==, >=, >, and between? methods.
Enumerable - By mixing in this module, and defining an each method, you get support for all the other related methods such as collect, inject, select, and reject. If it's also got the <=> method, then it will also support sort, min, and max.
DataMapper is also an interesting example of what can be done with a simple include statement, taking a standard class, and adding the ability to persist it to a data store.
Well the usual example I think is Persistence
module Persistence
def load sFileName
puts "load code to read #{sFileName} contents into my_data"
end
def save sFileName
puts "Uber code to persist #{#my_data} to #{sFileName}"
end
end
class BrandNewClass
include Persistence
attr :my_data
def data=(someData)
#my_data = someData
end
end
b = BrandNewClass.new
b.data = "My pwd"
b.save "MyFile.secret"
b.load "MyFile.secret"
Imagine the module is written by a Ruby ninja, which persists the state of your class to a file.
Now suppose I write a brand new class, I can reuse the functionality of persistence by mixing it in by saying include ModuleILike. You can even include modules at runtime. I get load and save methods for free by just mixing it in. These methods are just like the ones that you wrote yourself for your class. Code/Behavior/Functionality-reuse without inheritance!
So what you're doing is including methods to the method table for your class (not literally correct but close).
In ruby, the reason that Mixins aren't multiple-inheritance is that combining mixin methods is a one time thing. This wouldn't be such a big issue, except that Ruby's modules and classes are open to modification. This means that if you mixin a module to your class, then add a method to the module, the method will not be available to your class; where if you did it in the opposite order, it would.
It's like ordering an ice-cream cone. If you get chocolate sprinkles and toffee bits as your mixins, and walk away with your cone, what kind of ice cream cone you have won't change if someone adds multicolored sprinkles to the chocolate sprinkles bin back at the ice-cream shop. Your class, the ice cream cone, isn't modified when the mixin module, the bin of sprinkles is. The next person to use that mixin module will see the changes.
When you include a module in ruby, it calls Module#append_features on that module, which add a copy of that module's methods to the includer one time.
Multiple inheritance, as I understand it, is more like delegation. If your class doesn't know how to do something, it asks its parents. In an open-class environment, a class's parents may have been modified after the class was created.
It's like a RL parent-child relationship. Your mother might have learned how to juggle after you were born, but if someone asks you to juggle and you ask her to either: show you how (copy it when you need it) or do it for you (pure delegation), then she'll be able at that point, even though you were created before her ability to juggle was.
It's possible that you could modify a ruby module 'include' to act more like multiple inheritance by modifying Module#append_features to keep a list of includers, and then to update them using the method_added callback, but this would be a big shift from standard Ruby, and could cause major issues when working with others code. You might be better creating a Module#inherit method that called include and handled delegation as well.
As for a real world example, Enumerable is awesome. If you define #each and include Enumerable in your class, then that gives you access to a whole host of iterators, without you having to code each and every one.
It is largely used as one might use multiple inheritance in C++ or implementing interfaces in Java/C#. I'm not sure where your experience lies, but if you have done those things before, mixins are how you would do them in Ruby. It's a systemized way of injecting functionality into classes.

Resources