I have been using python for a while now and Im happy using it in most forms but I am wondering which form is more pythonic. Is it right to emulate objects and types or is it better to subclass or inherit from these types. I can see advantages for both and also the disadvantages. Whats the correct method to be doing this?
Subclassing method
class UniqueDict(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
def __setitem__(self, key, value):
if key not in self:
dict.__setitem__(self, key, value)
else:
raise KeyError("Key already exists")
Emulating method
class UniqueDict(object):
def __init__(self, *args, **kwargs):
self.di = dict(*args, **kwargs)
def __setitem__(self, key, value):
if key not in self.di:
self.di[key] = value
else:
raise KeyError("Key already exists")
Key question you have to ask yourself here is:
"How should my class change if the 'parent' class changes?"
Imagine new methods are added to dict which you don't override in your UniqueDict. If you want to express that UniqueDict is simply a small derivation in behaviour from dict's behaviour, then you'd go with inheritance since you will get changes to the base class automatically. If you want to express that UniqueDict kinda looks like a dict but actually isn't, you should go with the 'emulation' mode.
Subclassing is better as you won't have to implement a proxy for every single dict method.
I would go for subclass, and for the reason I would refer to the motivation of PEP 3119:
For example, if asking 'is this object
a mutable sequence container?', one
can look for a base class of 'list',
or one can look for a method named
'getitem'. But note that although
these tests may seem obvious, neither
of them are correct, as one generates
false negatives, and the other false
positives.
The generally agreed-upon remedy is to
standardize the tests, and group them
into a formal arrangement. This is
most easily done by associating with
each class a set of standard testable
properties, either via the inheritance
mechanism or some other means. Each
test carries with it a set of
promises: it contains a promise about
the general behavior of the class, and
a promise as to what other class
methods will be available.
In short, it is sometimes desirable to be able to check for mapping properties using isinstance.
Related
I have a method that returns an object which could be one of many different types of object but which are all part of the same ancestor class. The precise object type is inferred dynamically.
However, I'm confused as to what to put for the return value in the signature. I've put a placeholder below using instance_of to illustrate the problem:
sig{params(instance_class: String).returns(instance_of ParentClass)}
def build_instance instance_class
klass = Object.const_get(instance_class)
return klass.new
end
Given that I don't know which precise class will be returned (and I'd prefer not to list them explicitly) but I do know that it will be a subclass of ParentClass is there a way in Sorbet to specify this? I could use T.untyped but it's unnecessarily loose.
Through trial and error I've discovered that checking that the object includes the type in its ancestors is, if I understand correctly, sorbet's default behaviour.
Sorbet won't check that the object precisely matches the specified Type, only that it includes that type in its ancestors (perhaps this is what Typechecking in general means but I'm fairly new to the game).
To avoid the following error though:
Returning value that does not conform to method result type https://srb.help/7005
you also need to T.cast() the object that you return to the ParentClass:
sig{params(instance_class: String).returns(ParentClass)}
def build_instance instance_class
klass = Object.const_get(instance_class)
# NB instance is a descendent of ParentClass, not an instance...
return T.cast(klass.new, ParentClass)
end
This seems to work but I'd love to know whether it's the correct way to solve the problem.
Some open source code I'm integrating in my application has some classes that include code to that effect:
class SomeClass < SomeParentClass
def self.new(options = {})
super().tap { |o|
# do something with `o` according to `options`
}
end
def initialize(options = {})
# initialize some data according to `options`
end
end
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction", and it looks to me like a horrible pattern to use - why split up the object initialization into two parts where one is obviously "The Wrong Think(tm)"?
Ideally, I'd like to see what is inside the super().tap { |o| block, because although this looks like bad practice, just maybe there is some interaction required before or after initialize is called.
Without context, it is possible that you are just looking at something that works but is not considered good practice in Ruby.
However, maybe the approach of separate self.new and initialize methods allows the framework designer to implement a subclass-able part of the framework and still ensure setup required for the framework is completed without slightly awkward documentation that requires a specific use of super(). It would be a slightly easier to document and cleaner-looking API if the end user gets functionality they expect with just the subclass class MyClass < FrameworkClass and without some additional note like:
When you implement the subclass initialize, remember to put super at the start, otherwise the magic won't work
. . . personally I'd find that design questionable, but I think there would at least be a clear motivation.
There might be deeper Ruby language reasons to have code run in a custom self.new block - for instance it may allow constructor to switch or alter the specific object (even returning an object of a different class) before returning it. However, I have very rarely seen such things done in practice, there is nearly always some other way of achieving the goals of such code without customising new.
Examples of custom/different Class.new methods raised in the comments:
Struct.new which can optionally take a class name and return objects of that dynamically created class.
In-table inheritance for ActiveRecord, which allows end user to load an object of unknown class from a table and receive the right object.
The latter one could possibly be avoided with a different ORM design for inheritance (although all such schemes have pros/cons).
The first one (Structs) is core to the language, so has to work like that now (although the designers could have chosen a different method name).
It's impossible to tell why that code is there without seeing the rest of the code.
However, there is something in your question I want to address:
As far as I understand, both self.new and initialize do the same thing - the latter one "during construction" and the former one "after construction"
They do not do the same thing.
Object construction in Ruby is performed in two steps: Class#allocate allocates a new empty object from the object space and sets its internal class pointer to self. Then, you initialize the empty object with some default values. Customarily, this initialization is performed by a method called initialize, but that is just a convention; the method can be called anything you like.
There is an additional helper method called Class#new which does nothing but perform the two steps in sequence, for the programmer's convenience:
class Class
def new(*args, &block)
obj = allocate
obj.send(:initialize, *args, &block)
obj
end
def allocate
obj = __MagicVM__.__allocate_an_empty_object_from_the_object_space__
obj.__set_internal_class_pointer__(self)
obj
end
end
class BasicObject
private def initialize(*) end
end
The constructor new has to be a class method since you start from where there is no instance; you can't be calling that method on a particular instance. On the other hand, an initialization routine initialize is better defined as an instance method because you want to do something specifically with a certain instance. Hence, Ruby is designed to internally call the instance method initialize on a new instance right after its creation by the class method new.
I have been thinking about blocks in Ruby.
Please consider this code:
div {
h2 'Hello world!'
drag
}
This calls the method div(), and passes a block to it.
With yield I can evaluate the block.
h2() is also a method, and so is drag().
Now the thing is - h2() is defined in a module, which
is included. drag() on the other hand resides on an
object and also needs some additional information.
I can provide this at run-time, but not at call-time.
In other words, I need to be able to "intercept"
drag(), change it, and then call that method
on another object.
Is there a way to evaluate yield() line by line
or some other way? I don't have to call yield
yet, it would also be possible to get this
code as string, modify drag(), and then
eval() on it (although this sounds ugly, I
just need to have this available anyway
no mater how).
If I'm understanding you correctly, it seems that you're looking for the .tap method. Tap allows you to access intermediate results within a method chain. Of course, this would require you to restructure how this is set up.
You can kind of do this with instance_eval and a proxy object.
The general idea would be something like this:
class DSLProxyObject
def initialize(proxied_object)
#target = proxied_object
end
def drag
# Do some stuff
#target.drag
end
def method_missing(method, *args, &block)
#target.send(method, *args, &block)
end
end
DSLProxyObject.new(target_object).instance_eval(&block)
You could implement each of your DSL's methods, perform whatever modifications you need to when a method is called, and then call what you need to on the underlying object to make the DSL resolve.
It's difficult to answer your question completely without a less general example, but the general idea is that you would create an object context that has the information you need and which wraps the underlying DSL, then evaluate the DSL block in that context, which would let you intercept and modify individual calls on a per-usage basis.
I'm new in Ruby and I'm being a bit dissapointed that Hash values can't be accessed as objects (myHash.key), as pointed out in many other questions (example: How do I use hash keys as methods on a class?).
I don't like the openstruct solution cause its not recursive, and I don't want to modify the Hash class.
Therefore I've developed the following solution. First define the following module:
module NiceHash
def method_missing(name, *args, &blk)
if args.empty? && blk.nil? && self.has_key?(name.to_s)
result=self[name.to_s]
if result.is_a? Hash
result.extend(Nice_Hash)
end
return result
else
super
end
end
def respond_to?(sym, include_private = false)
super(sym, include_private) || (self.has_key?(sym.to_s))
end
end
And then use it with
a={"a"=>"a"}
a.extend(NiceHash)
a.a
The solution works.
My question is: this could be also done with a wrapper class. Which would be the better? My I found any (hidden) problem with the given solution.
Responding to your question about hidden problems with your solution, there are two main ones.
As other people have mentioned in comments, it will break when the keys are not valid ruby method names.
Your solution is based on method_missing, just like OpenStruct is. Therefore it's subject to the same shortcomings. As stated in the OpenStruct docs:
An OpenStruct utilizes Ruby’s method lookup structure to and find and
define the necessary methods for properties. This is accomplished
through the method method_missing and define_method.
This should be a consideration if there is a concern about the
performance of the objects that are created, as there is much more
overhead in the setting of these properties compared to using a Hash
or a Struct.
Is there any way to make instance variables "private"(C++ or Java definition) in ruby? In other words I want following code to result in an error.
class Base
def initialize()
#x = 10
end
end
class Derived < Base
def x
#x = 20
end
end
d = Derived.new
Like most things in Ruby, instance variables aren't truly "private" and can be accessed by anyone with d.instance_variable_get :#x.
Unlike in Java/C++, though, instance variables in Ruby are always private. They are never part of the public API like methods are, since they can only be accessed with that verbose getter. So if there's any sanity in your API, you don't have to worry about someone abusing your instance variables, since they'll be using the methods instead. (Of course, if someone wants to go wild and access private methods or instance variables, there isn’t a way to stop them.)
The only concern is if someone accidentally overwrites an instance variable when they extend your class. That can be avoided by using unlikely names, perhaps calling it #base_x in your example.
Never use instance variables directly. Only ever use accessors. You can define the reader as public and the writer private by:
class Foo
attr_reader :bar
private
attr_writer :bar
end
However, keep in mind that private and protected do not mean what you think they mean. Public methods can be called against any receiver: named, self, or implicit (x.baz, self.baz, or baz). Protected methods may only be called with a receiver of self or implicitly (self.baz, baz). Private methods may only be called with an implicit receiver (baz).
Long story short, you're approaching the problem from a non-Ruby point of view. Always use accessors instead of instance variables. Use public/protected/private to document your intent, and assume consumers of your API are responsible adults.
It is possible (but inadvisable) to do exactly what you are asking.
There are two different elements of the desired behavior. The first is storing x in a read-only value, and the second is protecting the getter from being altered in subclasses.
Read-only value
It is possible in Ruby to store read-only values at initialization time. To do this, we use the closure behavior of Ruby blocks.
class Foo
def initialize (x)
define_singleton_method(:x) { x }
end
end
The initial value of x is now locked up inside the block we used to define the getter #x and can never be accessed except by calling foo.x, and it can never be altered.
foo = Foo.new(2)
foo.x # => 2
foo.instance_variable_get(:#x) # => nil
Note that it is not stored as the instance variable #x, yet it is still available via the getter we created using define_singleton_method.
Protecting the getter
In Ruby, almost any method of any class can be overwritten at runtime. There is a way to prevent this using the method_added hook.
class Foo
def self.method_added (name)
raise(NameError, "cannot change x getter") if name == :x
end
end
class Bar < Foo
def x
20
end
end
# => NameError: cannot change x getter
This is a very heavy-handed method of protecting the getter.
It requires that we add each protected getter to the method_added hook individually, and even then, you will need to add another level of method_added protection to Foo and its subclasses to prevent a coder from overwriting the method_added method itself.
Better to come to terms with the fact that code replacement at runtime is a fact of life when using Ruby.
Unlike methods having different levels of visibility, Ruby instance variables are always private (from outside of objects). However, inside objects instance variables are always accessible, either from parent, child class, or included modules.
Since there probably is no way to alter how Ruby access #x, I don't think you could have any control over it. Writing #x would just directly pick that instance variable, and since Ruby doesn't provide visibility control over variables, live with it I guess.
As #marcgg says, if you don't want derived classes to touch your instance variables, don't use it at all or find a clever way to hide it from seeing by derived classes.
It isn't possible to do what you want, because instance variables aren't defined by the class, but by the object.
If you use composition rather than inheritance, then you won't have to worry about overwriting instance variables.
If you want protection against accidental modification. I think attr_accessor can be a good fit.
class Data
attr_accessor :id
private :id
end
That will disable writing of id but would be readable. You can however use public attr_reader and private attr_writer syntax as well. Like so:
class Data
attr_reader :id
private
attr_writer :id
end
I know this is old, but I ran into a case where I didn't as much want to prevent access to #x, I did want to exclude it from any methods that use reflection for serialization. Specifically I use YAML::dump often for debug purposes, and in my case #x was of class Class, which YAML::dump refuses to dump.
In this case I had considered several options
Addressing this just for yaml by redefining "to_yaml_properties"
def to_yaml_properties
super-["#x"]
end
but this would have worked just for yaml and if other dumpers (to_xml ?) would not be happy
Addressing for all reflection users by redefining "instance_variables"
def instance_variables
super-["#x"]
end
Also, I found this in one of my searches, but have not tested it as the above seem simpler for my needs
So while these may not be exactly what the OP said he needed, if others find this posting while looking for the variable to be excluded from listing, rather than access - then these options may be of value.