I realize this question may be too philosophical for StackOverflow, but I'm wondering if baseclassing built-in classes to extend their functionality is considered "good" Ruby style.
E.g.
class Grades < Array
def sum
sum = 0
self.each do |num|
sum += num
end
return sum
end
def avg
self.sum/self.length
end
end
Now Grades objects look like arrays when built, but have the additional sum and avg functions that I want access to. Would it be "better" style not to baseclass Array, but to add this functionality to a generic object?
Yes.
In general, everyone "monkey patches" everything in Ruby freely, from classes you wrote, to classes someone more important than you wrote, to library classes.
However, the general computing style guideline is if your class does everything, it does nothing. Your example accesses each() and length() efficiently, but your new Grades class now exposes every Array method, including ones you might not want called ~ and including ones that some cretin might someday go and monkey-patch! So if your Grades class were very public (used by your entire program), you might want to consider delegation.
Another guideline—that applies more in some languages than others—is you should never inherit unless you then override a method, to achieve polymorphism. Yet another rule which the entire Ruby community, including me, enjoys breaking freely.
For this case, I would say subclassing isn't really appropriate. A subclass should be a more specific version of its superclass—for example, Fixnum is a specific sort of Integer (it's a small integer stored in a particular way), which is a specific sort of Numeric (only some numbers are integers), which is a specific sort of Object (only objects that represent numbers are numerics). Your Grades class, on the other hand, is exactly equivalent to an Array except that it can calculate a couple more things about itself.
If Grades constrained something about the data it stored—for example, it only allowed you to insert numerics between 0.0 and 1.0 (or integers between 0 and 100, if you'd prefer)—it might make sense to subclass Array. On the other hand, it might also make sense to have Grades subclass Object directly, and keep the actual grades in an Array attribute.
Adding sum and avg, on the other hand, simply adds functionality that would be equally useful for other kinds of arrays, too. For such generic functionality, I would simply add those methods to Array so you don't have to worry about whether you've got a plain Array or a Grades in a particular place.
There are some gray areas here, of course—if you were proposing adding a letter method to convert the grades to an A through F grade letter, I wouldn't be so reluctant to subclass Grades. This is definitely a judgement call. But for this level of genericness, I really don't think subclassing is appropriate.
In Ruby, morals are freeer. Permissible is anything the programmer deems such. Actually, monkey patching existing classes is pretty much standard practice. Apart from what the other two answers said, let me bring your attention to the Ruby 2.0 refine feature, which allows you to monkey-patch within stricter boundaries, with less fear of undesirable interactions with other code.
But in your particular case, I think that your decision to create a separate class Grades might be a correct one. It's just a gut feeling, I'd have to be familiar with your codebase to say that for sure. It is less important, whether you make your Grades class a subclass of Array, or whether you just give it an attribute #grade_array, in which you will store actual grades and to which you will delegate the methods you want from Array class.
Related
The chaining of each_slice and to_a confuses me. I know that each_slice is a member of Enumerable and therefore can be called on enumerable objects like arrays, and chars does return an array of characters.
I also know that each_slice will slice the array in groups of n elements, which is 2 in the below example. And if a block is not given to each_slice, then it returns an Enumerator object.
'186A08'.chars.each_slice(2).to_a
But why must we call to_a on the enumerator object if each_slice has already grouped the array by n elements? Why doesn't ruby just evaluate what the enumerator object is (which is a collection of n elements)?
The purpose of enumerators is lazy evaluation. When you call each_slice, you get back an enumerator object. This object does not calculate the entire grouped array up front. Instead, it calculates each “slice” as it is needed. This helps save on memory, and also allows you quite a bit of flexibility in your code.
This stack overflow post has a lot of information in it that you’ll find useful:
What is the purpose of the Enumerator class in Ruby
To give you a cut and dry answer to your question “Why must I call to_a when...”, the answer is, it hasn’t. It hasn’t yet looped through the array at all. So far it’s just defined an object that says that when it goes though the array, you’re going to want elements two at a time. You then have the freedom to either force it to do the calculation on all elements in the enumerable (by calling to_a), or you could alternatively use next or each to go through and then stop partway through (maybe calculate only half of them as opposed to calculating all of them and throwing the second half away).
It’s similar to how the Range class does not build up the list of elements in the range. (1..100000) doesn’t make an array of 100000 numbers, but instead defines an object with a min and max and certain operations can be performed on that. For example (1..100000).cover?(5) doesn’t build a massive array to see if that number is in there, but instead just sees if 5 is greater than or equal to 1 and less than or equal to 100000.
The purpose of this all is performance and flexibility.
It may be worth considering whether your implementation actually needs to make an array up front, or whether you can actually keep your RAM consumption down a bit by iterating over the enumerator. (If your real world scenario is as simple as you described, an enumerator won’t help much, but if the array actually is large, an enumerator could help you a lot).
When working with indexed collections (most often immutable Vectors) I am often using coll.last as what I supposed to be a convenient short-cut to coll(coll.size-1). When randomly inspecting my sources, I have clicked to see the last implementation, and the IntelliJ IDE took me to TraversableLike.last implementation, which traverses all elements to eventually reach the last one.
This was a surprise to me, and I am not sure now what is the reason for this. Is last really implemented this way? Is there some reason preventing last to be implemented for IndexedSeq (or perhaps for IndexedSeqLike) efficiently?
(Scala SDK used is 2.11.4)
IndexedSeq does not override last (it only inherits it from TraversableLike) - the fact that a particular sequence supports indexed access does not necessarily make indexed lookups faster than traversals. However, such optimized implementations are given in IndexedSeqOptimized, which I would expect many implementations to inherit from. In the specific case of Vector, last is overridden explicitly in the class itself.
IndexedSeq has constant access time for the arbitrary element. LinearSeq has linear time. TraversableLike is just common interface and you may find that it's overriden inside IndexedSeqOptimized trait:
A template trait for indexed sequences of type IndexedSeq[A] which
optimizes the implementation of several methods under the
assumption of fast random access.
def last: A = if (length > 0) this(length - 1) else super.last
You may also find the quick random access implementation inside Vector.getElem - it uses a tree of arrays with high branching factor, so usually it's O(1) for apply. It doesn't use IndexedSeqOptimized, but it has its own overriden last:
override /*TraversableLike*/ def last: A = {
if (isEmpty) throw new UnsupportedOperationException("empty.last")
apply(length-1)
}
So it's a little mess inside Scala collections, which is very common for Scala internals. Anyway last on IndexedSeqs is O(1) de facto, regardless such tricky collections architecture.
The Scala collections intricacy is actually an active topic. A talk (and slides) with Scala's collection framework criticism may be found at Paul Phillips: Scala Collections: Why Not?, and Paul Phillips is developing his alternate version of std.
I'm currently developing a dynamically typed language.
One of the main problems I'm facing during development is how to do fast runtime symbol lookups.
For general, free global and local symbols I simply index them and let each scope (global or local) keep an array of the symbols and quickly look them up using the index. I'm very happy with this approach.
However, for attributes in objects the problem is much harder. I can't use the same indexing scheme on them, because I have no idea which object I'm currently accessing, thus I don't know which index to use!
Here's an example in python which reflects what I want working in my language:
class A:
def __init__(self):
self.a = 10
self.c = 30
class B:
def __init__(self):
self.c = 20
def test():
if random():
foo = A()
else:
foo = B()
# There could even be an eval here that sets foo
# to something different or removes attribute c from foo.
print foo.c
Does anyone know any clever tricks to do the lookup quickly? I know about hash maps and splay trees, so I'm interesting if there is any ways to do it as efficient as my other lookup.
Once you've reached the point where looking up properties in the hash table isn't fast enough, the standard next step is inline caching. You can do this in JIT languages, or even bytecode compilers or interpreters, though it seems to be less common there.
If the shape of your objects can change over time (i.e. you can add new properties at runtime) you'll probably end up doing something similar to V8's hidden classes.
A technique known as maps can store the values for each attribute in a compact array. The knowledge which attribute name corresponds to which index is maintained in an auxiliary data structure (the eponymous map), so you don't immediately gain a performance benefit (though it does use memory more efficiently if many objects share a set of attributes). With a JIT compiler, you can make the map persistent and constant-fold lookups, so the final machine code can use constant offsets into the attributes array (for constant attribute names).
In an interpreter (I'll assume byte code), things are much harder because you don't have much opportunity to specialize code for specific objects. However, I have an idea myself for turning attribute names into integral keys. Maintain a global mapping assigning integral IDs to attribute names. When adding new byte code to the VM (loading from disk or compiling in memory), scan for strings used as attributes, and replace them with the associated ID, creating a new ID if the string hasn't been seen before. Instead of storing hash tables or similar mappings on each object - or in the map, if you use maps - you can now use sparse arrays, which are hopefully more compact and faster to operate on.
I haven't had a change to implement and test this, and you still need a sparse array. Unless you want to make all objects (or maps) take as many words of memory as there are distinct attribute names in the whole program, that is. At least you can replace string hash tables with integer hash tables.
Just by tuning a hash table for IDs as keys, you can make several optimizations: Don't invoke a hash function (use the ID as hash), remove some indirection and hence cache misses, save yourself the complexity of dealing with pathologically bad hash functions, etc.
I'm doing something like this with a list 'a':
a.each_with_index |outer, i|
a.each_with_index |inner, j|
if(j > i)
# do some operation with outer and inner
end
end
end
if the iterator is not going to use the same order, this won't work. I don't care what the order actually is, I just need for two .each_with_index iterators to use the same order.
I would assume that it would be a property of an array that it has a fixed order and I'm just being paranoid that the iterator wouldn't use that order...
This depends on the specific Enumerable object you are operating on.
Arrays for example will always return elements in the same order. But other enumerable objects are not guaranteed to behave this way. A good example of this is the 1.8,7 base Hash. That is why many frameworks (most notably ActiveSupport) implement an OrderedHash.
One interesting side note: Even Hash will return objects in the same order if the hash has not changed between each calls. While many objects behave this way, relying on this subtlety is probably not a great idea.
So, no. The generic each will not always return objects in the same order.
P.S. Ruby 1.9's hashes are now actually ordered http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash
I've not looked at your actual code but here is your answer taken from the Ruby API docs:
Arrays are ordered, integer-indexed collections of any object.
So yes, you are being paranoid but surely that's a good thing when you're developing?
Array by definition is an ordered list of elements. So you should have no problems with that.
It depends on the specific Enumerable. Certainly an Array will always iterate in the obvious order.
It would be quite lunatic fringe for someone to implement an each method that would traverse the same collection in different ways, but the only actual restriction for such a "feature" would be in the documentation for the class that mixes in Enumerable. Well, in that and the sanity of the implementors.
I can almost imagine some sort of cryptographic API that deliberately traversed a collection in an unpredictable way.
This is something that really confuses me, it seems like time and time again I run into methods in ruby native data types that do the same thing (essentially), and yet have different names. If duck typing is so strongly encouraged by ruby and the ruby community, why aren't these methods named consistently across types?
You seem to imply that Hash does not have a length method and/or that other enumerables don't have a count method. That is not true.
count is a method defined in the Enumerable module and thus available on all enumerables. It differs from size and length in the following ways:
It (optionally) takes a block specifying which kind of elements to count.
It's available on all enumerables - not just those that keep track of their size - however it has a runtime in O(n) for those that don't (and always when given a block of course).
length and size (which are synonyms) are methods defined on all enumerable classes that keep track of their size (including Hash). They differ from count in that they always return the length in O(1) time and don't take a block.
In summary: You can call length or size on any object that keeps track of its size and you can call count on any enumerable. So duck typing is not hampered in any way.