Modify an object before marshaling it in Ruby - ruby

I have an object containing sensitive data that I want to marshal (using Marshal) without the sensitive data.
I'd like to be able to say:
def _dump(*args)
# Clean-up sensitive data
super
end
but this produces a 'no superclass method' error. Is there a way I can make my object behave the way I want in response to Marshal.dump, while using the default implementation?
I want Marshal.dump(my_obj) to work out-of-the-box without requiring the API consumer to remember to call a different method.

It may be that there is no superclass method for _dump. If it's defined on your object it's called. If not, the default handler is used.
You probably want to clone your object and remove the sensitive fields, returning that as a Hash inside your _dump function, then undo that within the _load method.
You can also read the documentation on Marshal where it describes the recommended methods.

Related

In Ruby, what are the use cases for adding methods to an instance's singleton class?

Thanks to some other posts and reading, I understand singleton/meta classes. And I understand why we'd want to use them on a class. But I still don't understand why we'd want to use them on instance objects. And I've yet to see it in practice.
I'm referring to something like this:
class Vehicle
def odometer_reading
# some code
end
end
my_car = Vehicle.new
def my_car.open_door
# some code
end
At first thought, this seems like a bad idea as it would lead to difficulties in understanding the code and debugging.
Why would we want to do this? What are some examples of when this is a good idea?
One example is using it for testing purposes: creating mock and double objects, stubbing methods. Debugging is somewhere nearby: re-defining the logging method for a specific object that you suspect is mis-behaving, so that the log info is printed directly to console (or more info is printed) during the debug session.
Another example is dealing with special cases - instead of inheritance you can do just that. Starting from a classical example if you use two types of Employees, say, Engineers and SalesPersons, for which the rules of compensation calculation are different, you can put the common logic into the Employee class, then inherit the other two classes from it and implement their own calculate_salary methods there. Now, if there is an outlier - a star salesman that you have agreed to a different compensation scheme with, a CEO with a very special scheme, etc - instead of creating a whole sub-class for this special employee, you can just define this method for a specific object representing that employee.
The third example is dealing with an object lifecycle and performance considerations. Instead of having a long case of various states in some processing method. E.g. for a file-reading class that transparently caches the entire file in the background (I know a too-simplistic-for-real-life approach, but just as a model) all read requests while the file is not entirely read should check if the requested data is already in the cache or should be read from disk. Once the file is fully read they always go from the cache. Instead of having the if (case if there are more states) to deal with this you could simply re-define the read method at the object-level once the file is fully read to the cache. For this simple example it doesn't lead to any sizable performance benefit (if any benefit at all), but for more complex cases that may be worth it.
You wouldn't add them using def, that's a rather rigid way of doing it, but instead by using something like define_method or extend.
Although this is not the sort of thing you'd do on a routine basis, it does mean you can do some rather unusual things. ActiveRecord in Rails produces results in the form of an Array with additional methods added on to perform other operations.
An Object-Relationship Mapper would be a case where you'd probably want to do this. Sometimes, depending on how you fetch a record, the methods available differ significantly. Being able to add those dynamically means each fetched object can be completely customized even if they have the same class and general-purpose methods.
Another example: You have an array of hashes and you want each hash to have a method-call getter and setter. Something like:
user = HashOnSteroids.new(name: 'John')
user[:name] # => 'John'
user[:name] = 'Joe'
user.name # => 'Joe'
user.name = 'John'
user.set(name: 'Jim', age: 5)
This means you cannot write standard method definitions in the class as each hash will have a different set of keys (method names). This means you have to resort to defining singleton methods so each object has its own set of methods (not a pack of shared methods).
Warning: Using singleton methods for this use case is highly inefficient. A sneaky method_missing is faster and uses way less memory as it doesn't have to allocate a billion of proc objects.

How to make an attribute transient (not marshalled)

I'm using Ruby 1.8.6 and have a class (not an ActiveRecord class) that I want to cache using memcache-client, which serializes it with Marshal.dump before storing it in the cache. However, it has an instance variable (which does refer to an ActiveRecord class) that I don't want to serialize, as I don't want multiple objects running around corresponding to the same database row. Instead, I want to set the attribute to refer to the appropriate object (which I already have a reference to) after the serialized object is loaded from the cache and reconstructed.
What's the easiest way to prevent only one attribute from being marshalled?
(I'm aware of this question, but the given answer appears to apply only to ActiveRecord classes.)
from http://www.ruby-doc.org/core-1.9.3/Marshal.html
When dumping an object the method marshal_dump will be called.
marshal_dump must return a result containing the information necessary
for marshal_load to reconstitute the object. The result can be any
object.
When loading an object dumped using marshal_dump the object is first
allocated then marshal_load is called with the result from
marshal_dump. marshal_load must recreate the object from the
information in the result.
so the question you are linking to also applies to you. just override those two methods and you should be fine.

Coding Style: How to Make Obvious Determination of Parameter's Type We Have To Pass To a Function?

What is the best way to document the type of parameters that a function expects to receive?
Sometimes a function uses only one or two fields of an object. Sometimes this fields have common names (get(), set(), reset(), etc.). In this situation we must leave a comments:
...
#staticmethod
def get( postId, obj ):
"""obj is instance of class Type1, not Type2"""
inner = obj.get()
Is there a more explicit way to make it obvious? Maybe an object name should contain expecting typename?
Given python's 'duck-typing' (late bound) behaviour, it would be a mistake to require a particular type.
If you know which types your function must not take, you can raise an exception after detecting those; otherwise, simply raise an exception if the object passed does not support the appropriate protocol.
As to documentation, just put the required protocol in the docstring.
One strength of python is "duck typing", that is not to rely on the actual type of a variable, but on its behaviour. So I'd suggest, that you document the field, that the object should contain.
"""obj should have a field 'foo' like in class 'bar' or 'baz' """
First of all, name your methods properly, and use properties if they make sense.
You should try to get the hang of duck-typing. It's pretty useful. And if not, try and see if abstract base classes helps you do what you want.

How can I update an already instantiated Ruby object with YAML?

Basically, I have an instance of a Ruby object already but want to update whatever instance variables I can from yaml. There is a to_yaml function that will dump my object to yaml. I'm looking for something in the reverse. For example, my_obj.from_yaml(yaml_stuff) and have it update instance variables from the yaml passed in.
Would I need to, in my from_yaml function, use YAML::load and copy each instance variable? Is there a function I can use to quickly copy those variables without much typing if that is the case?
Does Ruby's yaml library have something already where I can pass it the object and the yaml and it'll just do what I want it to do?
Editing for clarity
This is a simple object that will store and load very simple yaml compatible types such as strings and integers.
What I ended up doing
Although I answered this question I wanted to add what I ended up doing, my Object monkey patch
class Object
def from_yaml(yml)
if (yml.nil?)
return
end
yml.instance_variables.each do |iv|
if (self.instance_variable_defined?(iv))
self.instance_variable_set(iv, yml.instance_variable_get(iv))
end
end
end
end
Your question is not clear enough. Which class are you talking about? What kind of YAML documents? You can't have everything serialized to and from YAML.
Let's assume that your object just has a set of instance variables of simple, YAML-compatible types, such as strings, numbers and symbols.
In that case, you can generally, write from_yaml method, which would load YAML file into a hash of key->value pairs, iterate through it and update every instance variable named key with value. Does that seem useful, and if it does, do you need help writing such method?
Edit:
There is no need for you to keep your object state in a hash - you can still use ivars and attr_accessors - just open up a new module (say YamlUpdateable), implement a from_yaml method which would update your ivars from a hash deserialized from YAML, and include the module in whichever class you want to deserialize from YAML.
As far as I know, there's nothing like that included with the YAML library itself; it's mostly meant for dumping and reading data, not keeping it up-to-date in memory and on disk. If you're planning to keep data in memory and on disk synced with each other with minimal hassle, have you considered a data persistence library like ActiveRecord or Stone?
If you're still keen on using the YAML library, and assuming you don't have many different classes to persist, it might make sense to simply write a small "updater" method that updates an object of that class given a similar object. Or you could rework your application to make sure you can simply reload all the objects from the YAML without having to update them (i.e., dump the old objects and create new ones).
The other option is to use metaprogramming to read into an object's properties and update them accordingly, but that seems error-prone and dangerous.
What you are looking for is the merge command.
// fetch yaml file
yml = YAML.load_file("path/to/file.yml")
// merge variables
my_obj.merge(yml)

Ruby Style: should initialize take a file with data or just the raw data as parameters

I was curious if anyone had insight on what is the best way for an object to load data from a file in Ruby. Is there a convention? There are two ways I can think of accomplishing this:
Have the initialize method accept a path or file and parse the data within the initialize method, setting the object variables as well.
Have the main "runner" code open the file and parse it, then pass the correct arguments to your constructor.
I am also aware that I could support both methods through an options hash or *args and looking at its size, but I do not have any need to implement both.
I would use the second option combined with providing the path info as an argument to the main code. This makes it more portable and keeps the object de-coupled from the source of the data

Resources