Why return an enumerator? - ruby

I''m curious about why ruby returns an Enumerator instead of an Array for something that seems like Array is an obvious choice. For example:
'foo'.class
# => String
Most people think of a String as an array of chars.
'foo'.chars.class
# => Enumerator
So why does String#chars return an Enumerable instead of an Array? I'm assuming somebody put a lot of thought into this and decided that Enumerator is more appropriate but I don't understand why.

If you want an Array, call #to_a. The difference between Enumerable and Array is that one is lazy and the other eager. It's the good old memory (lazy) vs. cpu (eager) optimization. Apparently they chose lazy, also because
str = "foobar"
chrs = str.chars
chrs.to_a # => ["f", "o", "o", "b", "a", "r"]
str.sub!('r', 'z')
chrs.to_a # => ["f", "o", "o", "b", "a", "z"]

Abstraction - the fact that something may be an Array is an implementation detail you don't care about for many use cases. For those where you do, you can always call .to_a on the Enumerable to get one.
Efficiency - Enumerators are lazy, in that Ruby doesn't have to build the entire list of elements all at once, but can do so one at a time as needed. So only the number you need is actually computed. Of course, this leads to more overhead per item, so it's a trade-off.
Extensibility - the reason chars returns an Enumerable is because it is itself implemented as an enumerator; if you pass a block to it, that block will be executed once per character. That means there's no need for e.g. .chars.each do ... end; you can just do .chars do ... end. This makes it easy to construct operation chains on the characters of the string.

This completely in accordance with the spirit of 1.9: to return enumerators whenever possible. String#bytes, String#lines, String#codepoints, but also methods like Array#permutation all return an enumerator.
In ruby 1.8 String#to_a resulted in an array of lines, but the method is gone in 1.9.

'Most people think of a String as an array of chars' ... only if you think like C or other languages. IMHO, Ruby's object orientation is much more advanced than that. Most Array operations tend to be more Enumerable like, so it probably makes more sense that way.
An array is great for random access to different indexes, but strings are rarely accessed by a particular index. (and if you are trying to to access a particular index, I suspect you are probably doing school work)
If you are trying to inspect each character, Enumerable works. With Enumberable, you have access to map, each, inject, among others. Also for substitution, there are string functions and regular expressions.
Frankly, I can't think of a real world need for an array of chars.

Maybe a string in ruby is mutable? Then having an Array isn't really an obvious choice - the length could change, for instance. But you will still want to enumerate the characters...
Also, you don't really want to be passing around the actual storage for the characters of a string, right? I mean, I don't remember much ruby (it's been a while), but if I were designing the interface, I'd only hand out "copies" for the .chars method/attribute/whatever. Now... Do you want to allocate a new array each time? Or just return a little object that knows how to enumerate the characters in the string? Thus, keeping the implementation hidden.
So, no. Most people don't think of a string as an array of chars. Most people think of a string as a string. With a behavior defined by the library/language/runtime. With an implementation you only need to know when you want to get nasty and all private with stuff below the abstraction belt.

Actually 'foo'.chars passes each character in str to the given block, or returns an enumerator if no block is given.
Check it :
irb(main):017:0> 'foo'.chars
=> #<Enumerable::Enumerator:0xc8ab35 #__args__=[], #__object__="foo", #__method__=:chars>
irb(main):018:0> 'foo'.chars.each {|p| puts p}
f
o
o
=> "foo"

Related

What does `:|` do in Ruby?

I found the following syntax in another question, and I have been unable to find any documentation on what its doing - I'm assuming syntactic sugar of some sort:
[array1, array2, array3, array4].compact.reduce([], :|)
I allows for one of the arrays to be nil instead of an array, and seems to work like a charm. Can anyone point me in the right direction to understand what is going on?
The original question is here: Merge arrays if not nil and not empty
It's a symbol, like :test, but a single character symbol.
The two-argument version of reduce accepts as a second argument a method name, the name of the method in this case is :|, or the | method. | on arrays is a set operation, it "or"s the arrays together, giving you the unique superset of all elements contained in both arrays. This isn't a particularly idiomatic use of reduce, you could achieve the same thing with .flatten.uniq
If you wanted to add the numbers, you could use :+, or to multiply you could use :*.
It's the same thing as this:
[array1, array2, array3, array4].compact.reduce([]) do |memo, array|
memo | array
end
Although it has syntactic sugar, Array#| is a method which you can see the docs for here. As the docs say:
Set Union — Returns a new array by joining ary with other_ary, excluding any duplicates and preserving the order from the original array
When the block of reduce takes this particular form (calling a single method on memo, passing the iteration's element as an argument), you can omit the block and just pass the method name.

Cleaner way of mapping a hash in ruby

Let's assume I need to do a trivial task on every element of a Hash, e.g. increment its value by 1, or change value into an array containing that value. I've been doing it like this
hash.map{ |k, v| [k, v+1] }.to_h
v+1 is just an example, it can be anything.
Is there any cleaner way to do this? I don't really like mapping a hash to an array of 2-sized arrays, then remembering to convert it to hash again.
Example of what might be nicer:
hash.hash_map{ |v| v+1 }
This way some thing like string conversion (to_s) might be simplified to
hash.hash_map(&:to_s)
Duplication clarification:
I'm not looking for Hash[...] or .to_h, I'm asking if anyone knows a more compact and cleaner solution.
That's just the way Ruby's collection framework works. There is one map method in Enumerable which doesn't know anything about hashes or arrays or lists or sets or trees or streams or whatever else you may come up with. All it knows is that there is a method named each which will yield one single element per iteration. That's it.
Note that this is the same way the collections frameworks of Java and .NET work, too. All collections operations always return the same type: in .NET, that's IEnumerable, in Ruby, that's Array.
Another design approach is that collections operations are type-preserving, i.e. mapping a set will produce a set, etc. That's the way it is done in Smalltalk, for example. However, in Smalltalk, but there it is achieved by copy&pasting almost identical methods into each and every different collection. I.e. if you want to implement your own collection, in Ruby, you only have to implement each, and you get everything else for free, whereas in Smalltalk, you have to implement every single collection method separately. (In Ruby, that would be over 40 methods.)
Scala is the first language that managed to provide a collections framework with type-preserving operations without code duplication, but it took until Scala 2.8 (released in 2010) to figure that out. (The key is the idea of collection builders.) Ruby's collections library was designed in 1993, 17 years before we had figured out how to do type-preserving collections operations without code duplication. Plus, Scala depends heavily on its sophisticated static type system and type-level metaprogramming to find the correct collection builder at compile time. This is not necessary for the scheme to work, but having to look up the builder for every operation at runtime may incur a hefty runtime cost.
What you could do is add new methods that are not part of the standard Enumerable protocol, for example similar to Scala's mapValues and mapKeys.
AFAIK, this does not exist in the Hash out of Ruby box, but here is a simple monkeypatch to achieve what you want:
▶ class Hash
▷ def hash_map &cb
▷ keys.zip(values.map(&cb)).to_h
▷ end
▷ end
There are more readable ways to achieve the requested functionality, but this one uses the built-in map for values once, pretending to be the fastest implementation that comes into my mind.
▶ h = {a: 1, b: 2}
#⇒ { :a => 1, :b => 2 }
▶ h.hash_map do |v| v + 5 end
#⇒ { :a => 6, :b => 7 }

File.open('file.txt') vs. File.open('file.txt').readlines

I checked using File.open('file.txt').class and File.open('file.txt').readlines.class and the former one returns File and the latter returns Array.
I understand this difference, but if I do something like:
File.open('file.txt').collect {|l| l.upcase}
=== File.open('file.txt').readlines.collect {|l| l.upcase}
it returns true. So are there any differences between the two objects when each item in the object is being passed to a block as an argument?
And also, I was assuming that the arguments that are passed to the block in both expressions are both a line in the file as a string which makes the comparison return true, is that correct? If so, how do I know what kind of argument will be passed to the block when I write the code? Do I have to check something like the documentation or the source code?
For example, I know how
['a','b','c'].each_with_index { |num, index| puts "#{index + 1}: #{num}" }
works and take this for granted. But how do I know the first argument should be each item in the array and the second the index, instead of the reverse?
Hope that makes sense, thanks!
Get comfortable doing some Ruby introspection in irb.
irb(main):001:0> puts File.ancestors.inspect
[File, IO, File::Constants, Enumerable, Object, Kernel, BasicObject]
This result shows us classes the File class inherits from and that includes the methods of class Enumerable. So what object is returned from File.readlines? An Array I think, let's check.
ri File.readlines
IO.readlines(name, sep=$/ [, open_args]) -> array
IO.readlines(name, limit [, open_args]) -> array
IO.readlines(name, sep, limit [, open_args]) -> array
This may be overkill, but we can verify Enumerable methods exists within an Array.
irb(main):003:0> puts Array.ancestors.inspect
[Array, Enumerable, Object, Kernel, BasicObject]
I'll try to make this response as compact as possible.
The second question - if you operate on objects which come from standard library you may always refer to their documentation in order to be 100% sure what arguments to expect when passing a block. For instance, multiple times you will be using methods like each, select, map (etc ...), on different data structures (Array, Hash, ...). Before you get used to them you may find all information about them in docs for example: http://ruby-doc.org/core-2.2.0/Array.html
If you are operating on non core data structures (for example a class which comes from gem you include, you should always browse it's documentation or sources on Github).
The first question. Result may be the same, when using different methods, but on deeper level there may be some differences. As far as your case is based on files. Reading file content may be processed in two ways. First - read everything into memory and operate on array of string, and the second - read file lines sequentially which may last longer but will not reserve as much memory.
In Enumerable#each_with_index the arguments are in the same order as they are in the method name itself, first the element, then the index. Same thing with each_with_object.

Can't all or most cases of `each` be replaced with `map`?

The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.

Why were ruby loops designed that way?

As is stated in the title, I was curious to know why Ruby decided to go away from classical for loops and instead use the array.each do ...
I personally find it a little less readable, but that's just my personal opinion. No need to argue about that. On the other hand, I suppose they designed it that way on purpose, there should be a good reason behind.
So, what are the advantages of putting loops that way? What is the "raison d'etre" of this design decision?
This design decision is a perfect example of how Ruby combines the object oriented and functional programming paradigms. It is a very powerful feature that can produce simple readable code.
It helps to understand what is going on. When you run:
array.each do |el|
#some code
end
you are calling the each method of the array object, which, if you believe the variable name, is an instance of the Array class. You are passing in a block of code to this method (a block is equivalent to a function). The method can then evaluate this block and pass in arguments either by using block.call(args) or yield args. each simply iterates through the array and for each element it calls the block you passed in with that element as the argument.
If each was the only method to use blocks, this wouldn't be that useful but many other methods and you can even create your own. Arrays, for example have a few iterator methods including map, which does the same as each but returns a new array containing the return values of the block and select which returns a new array that only contains the elements of the old array for which the block returns a true value. These sorts of things would be tedious to do using traditional looping methods.
Here's an example of how you can create your own method with a block. Let's create an every method that acts a bit like map but only for every n items in the array.
class Array #extending the built in Array class
def every n, &block #&block causes the block that is passed in to be stored in the 'block' variable. If no block is passed in, block is set to nil
i = 0
arr = []
while i < self.length
arr << ( block.nil? ? self[i] : block.call(self[i]) )#use the plain value if no block is given
i += n
end
arr
end
end
This code would allow us to run the following:
[1,2,3,4,5,6,7,8].every(2) #= [1,3,5,7] #called without a block
[1,2,3,4,5,6,7,8,9,10].every(3) {|el| el + 1 } #= [2,5,8,11] #called with a block
Blocks allow for expressive syntax (often called internal DSLs), for example, the Sinatra web microframework.
Sinatra uses methods with blocks to succinctly define http interaction.
eg.
get '/account/:account' do |account|
#code to serve of a page for this account
end
This sort of simplicity would be hard to achieve without Ruby's blocks.
I hope this has allowed you to see how powerful this language feature is.
I think it was mostly because Matz was interested in exploring what a fully object oriented scripting language would look like when he built it; this feature is based heavily on the CLU programming language's iterators.
It has turned out to provide some interesting benefits; a class that provides an each method can 'mix in' the Enumerable module to provide a huge variety of pre-made iteration routines to clients, which reduces the amount of tedious boiler-plate array/list/hash/etc iteration code that must be written. (Ever see java 4 and earlier iterators?)
I think you are kind of biased when you ask that question. Another might ask "why were C for loops designed that way?". Think about it - why would I need to introduce counter variable if I only want to iterate through array's elements? Say, compare these two (both in pseudocode):
for (i = 0; i < len(array); i++) {
elem = array[i];
println(elem);
}
and
for (elem in array) {
println(elem);
}
Why would the first feel more natural than the second, except for historical (almost sociological) reasons?
And Ruby, highly object-oriented as is, takes this even further, making it an array method:
array.each do |elem|
puts elem
end
By making that decision, Matz just made the language lighter for superfluous syntax construct (foreach loop), delegating its use to ordinary methods and blocks (closures). I appreciate Ruby the most just for this very reason - being really rational and economical with language features, but retaining expressiveness.
I know, I know, we have for in Ruby, but most of the people consider it unneccessary.
The do ... end blocks (or { ... }) form a so-called block (almost a closure, IIRC). Think of a block as an anonymous method, that you can pass as argument to another method. Blocks are used a lot in Ruby, and thus this form of iteration is natural for it: the do ... end block is passed as an argument to the method each. Now you can write various variations to each, for example to iterate in reverse or whatnot.
There's also the syntactic sugar form:
for element in array
# Do stuff
end
Blocks are also used for example to filter an array:
array = (1..10).to_a
even = array.select do |element|
element % 2 == 0
end
# "even" now contains [2, 4, 6, 8, 10]
I think it's because it emphasizes the "everything is an object" philosophy behind Ruby: the each method is called on the object.
Then switching to another iterator is much smoother than changing the logic of, for example, a for loop.
Ruby was designed to be expressive, to read as if it was being spoken... Then I think it just evolved from there.
This comes from Smalltalk, that implements control structures as methods, thus reducing the number of keywords and simplifying the parser. Thus allowing controll strucures to serve as proff of concept for the language definition.
In ST, even if conditions are methods, in the fashion:
boolean.ifTrue ->{executeIfBody()}, :else=>-> {executeElseBody()}
In the end, If you ignore your cultural bias, what will be easier to parse for the machine will also be easier to parse by yourself.

Resources