What is the purpose of the Enumerator class in Ruby - ruby

If I create an Enumertor like so:
enum = [1,2,3].each => #<Enumerator: [1, 2, 3]:each>
enum is an Enumerator. What is the purpose of this object? I can't say this:
enum { |i| puts i }
But I can say this:
enum.each { |i| puts i }
That seems redundant, because the Enumerator was created with .each. It seems like it's storing some data regarding the each method.
I don't understand what's going on here. I'm sure there is some logical reason we have this Enumerator class, but what can it do that an Array can't? I thought maybe it was an ancestor of Array and other Enumerables, but it doesn't seem to be. What exactly is the reason for the existence of the Enumerator class, and in what context would it ever be used?

What happens if you do enum = [1,2,3].each; enum.next?:
enum = [1,2,3].each
=> #<Enumerator: [1, 2, 3]:each>
enum.next
=> 1
enum.next
=> 2
enum.next
=> 3
enum.next
StopIteration: iteration reached an end
This can be useful when you have an Enumerator that does a calculation, such as a prime-number calculator, or a Fibonacci-sequence generator. It provides flexibility in how you write your code.

I think, the main purpose is to get elements by demand instead of getting them all in a single loop. I mean something like this:
e = [1, 2, 3].each
... do stuff ...
first = e.next
... do stuff with first ...
second = e.next
... do more stuff with second ...
Note that those do stuff parts can be in different functions far far away from each other.
Lazily evaluated infinite sequences (e.g. primes, Fibonacci numbers, string keys like 'a'..'z','aa'..'az','ba'..'zz','aaa'.. etc.) are a good use case for enumerators.

As answered so far, Enumerator comes in handy when you want to iterate through a sequence of data of potentially infinite length.
Take a prime number generator prime_generator that extends Enumerator for example. If we want to get the first 5 primes, we can simply write prime_generator.take 5 instead of embedding the "limit" into the generating logic. Thus we can separate generating prime numbers and taking a certain amount out of generated prime numbers making the generator reusable.
I for one like method chaining using methods of Enumerable returning Enumerator like the following example (it may not be a "purpose" but I want to just point out an aesthetic aspect of it):
prime_generator.take_while{|p| p < n}.each_cons(2).find_all{|pair| pair[1] - pair[0] == 2}
Here the prime_generator is an instance of Enumerator that returns primes one by one. We can take prime numbers below n using take_while method of Enumerable. The methods each_cons and find_all both return Enumerator so they can be chained. This example is meant to generate twin primes below n. This may not be an efficient implementation but is easily written within a line and IMHO suitable for prototyping.
Here is a pretty straightforward implementation of prime_generator based on Enumerator:
def prime?(n)
n == 2 or
(n >= 3 and n.odd? and (3...n).step(2).all?{|k| n%k != 0})
end
prime_generator = Enumerator.new do |yielder|
n = 1
while true
yielder << n if prime? n
n += 1
end
end

It is possible to combine enumerators:
array.each.with_index { |el, idx| ... }

To understand the major advantage of the enumerator class, you first need to distinguish internal and external iterators. With internal iterators, the iterator itself controls the iteration. With external iterators, the client (often times the programmer) controls the iteration. Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element in the collection.
In Ruby, the Enumerator class enables you to make use of external iterators. And once you understand external iterators you will begin to discover a lot of advantages. First, let's look how the Enumerator class facilitates external iteration:
class Fruit
def initialize
#kinds = %w(apple orange pear banana)
end
def kinds
yield #kinds.shift
yield #kinds.shift
yield #kinds.shift
yield #kinds.shift
end
end
f = Fruit.new
enum = f.to_enum(:kinds)
enum.next
=> "apple"
f.instance_variable_get :#kinds
=> ["orange", "pear", "banana"]
enum.next
=> "orange"
f.instance_variable_get :#kinds
=> ["pear", "banana"]
enum.next
=> "pear"
f.instance_variable_get :#kinds
=> ["banana"]
enum.next
=> "banana"
f.instance_variable_get :#kinds
=> []
enum.next
StopIteration: iteration reached an end
It's important to note that calling to_enum on an object and passing a symbol that corresponds to a method will instantiate Enumerator class and in our example, the enum local variable holds an Enumerator instance. And then we use external iteration to traverse through the enumeration method we created. Our enumeration method called "kinds" and notice we use the yield method, which we typically do with blocks. Here, the enumerator will yield one value at a time. It pauses after each yield. When asked for another value, it will resume immediately after the last yielded value, and execute up to the next yielded value. When nothing left to yield, and you call next, it will invoke StopIteration exception.
So what is the power of external iteration in Ruby? There are several benefits and I will highlight a few of them. First, the Enumerator class allows for chaining. For example, with_index is defined in the Enumerator class and it allows us to specify a start value for iteration when iterating over an Enumerator object:
f.instance_variable_set :#kinds, %w(apple orange pear banana)
enum.rewind
enum.with_index(1) do |name, i|
puts "#{name}: #{i}"
end
apple: 1
orange: 2
pear: 3
banana: 4
Second, it provides a TON of useful convenience methods from the Enumerable module. Remember Enumerator is a class and Enumerable is a module, but the Enumerable module is included in the Enumerator class and so Enumerators are Enumerable:
Enumerator.ancestors
=> [Enumerator, Enumerable, Object, Kernel, BasicObject]
f.instance_variable_set :#kinds, %w(apple orange pear banana)
enum.rewind
enum.detect {|kind| kind =~ /^a/}
=> "apple"
enum
=> #<Enumerator: #<Fruit:0x007fb86c09bdf8 #kinds=["orange", "pear", "banana"]>:kinds>
And there is one other major benefit of Enumerator that might not be immediately clear. Let me explain this through a demonstration. As you probably know, you can make any of your user-defined classes Enumerable by including the Enumerable module and defining an each instance method:
class Fruit
include Enumerable
attr_accessor :kinds
def initialize
#kinds = %w(apple orange pear banana)
end
def each
#kinds.each { |kind| yield kind }
end
end
This is cool. Now we have a ton of Enumerable instance method goodies available to us like chunk, drop_while, flat_map, grep, lazy, partition, reduce, take_while and more.
f.partition {|kind| kind =~ /^a/ }
=> [["apple"], ["orange", "pear", "banana"]]
It's interesting to note that each of the instance methods of Enumerable module actually call our each method behind the scenes in order to get the enumerable items. So if we were to implement the reduce method, it might look something like this:
module Enumerable
def reduce(acc)
each do |value|
acc = yield(acc, value)
end
acc
end
end
Notice how it passes a block to the each method and so our each method is expected to yield something back to the block.
But look what happens if client code calls the each method without specifying a block:
f.each
LocalJumpError: no block given (yield)
So now we can modify our each method to use enum_for, which will return an Enumerator object when a block is not given:
class Fruit
include Enumerable
attr_accessor :kinds
def initialize
#kinds = %w(apple orange pear banana)
end
def each
return enum_for(:each) unless block_given?
#kinds.each { |kind| yield kind }
end
end
f = Fruit.new
f.each
=> #<Enumerator: #<Fruit:0x007ff70aa3b548 #kinds=["apple", "orange", "pear", "banana"]>:each>
And now we have an Enumerator instance we could control with our client code for later use.

Related

Way to refer to the receiver of 'Array#each'

I am iterating over an array, and I'm wondering if there's a shorthand to refer to the receiver of #each (or #each_with_index) method from within the iteration.
self returns main.
You should be able to just reference it:
my_thing.each {|one_thing| puts my_thing }
This is pretty similar to the answer I gave here https://stackoverflow.com/a/45421168/2981429 but slightly different.
First off, you can create a scope with self bound to the array, and then execute the each in that scope:
[1].instance_exec do
# in this scope, self is the array
# thus we can use just 'each' because the self is inferred
each do |x|
# note that since 'class' is a special keyword,
# it needs to be explicitly namespaced on self
puts self.class, x
end
end
# => prints Array, 1
You can create a utility function to do this, if you want:
def bound_each(enumerable, &blk)
enumerable.instance_exec { each &blk }
end
bound_each([1]) { |x| puts self.class, x }
# prints Array, 1
You can call your each method within an Object#tap block and reference the original receiver like that.
[1, 2, 3].tap { |i| i.each { |j| p i.dup << j } }
# [1, 2, 3, 1]
# [1, 2, 3, 2]
# [1, 2, 3, 3]
#=> [1, 2, 3]
Here the receiving object is [1, 2, 3] and is passed to the block-variable i which we can use locally or in nested scopes such as each's block.
Avoid modifying the receiving object else you may end up with undesired results such as an infinite array. Using dup could allay this possibility.
This is an interesting question. As far as I know it's not possible – the closest I can come up with would be to use inject (or reduce) and explicitly pass the receiver as an argument. A bit pointless, but there might be a use-case for it that I'm not seeing:
a = [1,2,3]
a.inject(a) do |this, element|
this == a #=> true
this.include?(element) #=> true
this
end
Apart from looking a bit redundant, you have to be very sure to return this at the end of each iteration, as the return value will become this in the next iteration. For that reason (and the fact that you could just reference your collection in an each block, as in David's answer) I don't recommend using this.
Edit - as Simple Lime pointed out in the comments – I missed the obvious Enumerator#with_object, which has the same (rather pointless) effect, but without the drawback of having to return this at the end of each iteration. For example:
a = [1,2,3]
a.map.with_object(a) do |element, this|
this == a #=> true, for each iteration
end
I still don't recommend that you use this though.

How can I create an enumerator that does certain things after iteration?

How can I create an enumerator that optionally takes a block? I want some method foo to be called with some arguments and an optional block. If it is called with a block, iteration should happen on the block, and something should be done on it, involving the arguments given. For example, if foo were to take a single array argument, apply map on it with the given block, and return the result of join applied to it, then it would look like:
foo([1, 2, 3]){|e| e * 3}
# => "369"
If it is called without a block, it should return an enumerator, on which instance methods of Enumerator (such as with_index) should be able to apply, and execute the block in the corresponding way:
enum = foo([1, 2, 3])
# => Enumerator
enum.with_index{|e, i| e * i}
# => "026"
I defined foo using a condition to see if a block is given. It is easy to implement the case where the block is given, but the part returning the enumerator is more difficult. I guess I need to implement a sublass MyEnum of Enumerator and make foo return an instance of it:
def foo a, &pr
if pr
a.map(&pr).join
else
MyEnum.new(a)
end
end
class MyEnum < Enumerator
def initialize a
#a = a
...
end
...
end
But calling MyEnum.new raises a warning message: Enumerator.new without a block is deprecated; use Object#to_enum. If I use to_enum, I think it would return a plain Enumerator instance, not the MyEnum with the specific feature built in. On top of that, I am not sure how to implement MyEnum in the first place. How can I implement such enumerator? Or, what is the right way to do this?
You could do something like this.
def foo a, &pr
if pr
a.map(&pr).join
else
o = Object.new
o.instance_variable_set :#a, a
def o.each *y
foo #a.map { |z| yield z, *y } { |e| e }
end
o.to_enum
end
end
Then we have
enum = foo([1,2,3])
enum.each { |x| 2 * x } # "246"
or
enum = foo([1,2,3])
enum.with_index { |x, i| x * i } # "026"
Inspiration was drawn from the Enumerator documentation. Note that all of your expectations about enumerators like you asked for hold, because .to_enum takes care of all that. enum is now a legitimate Enumerator!
enum.class # Enumerator

Does Hash override Enumerable#map()?

Given that map() is defined by Enumerable, how can Hash#map yield two variables to its block? Does Hash override Enumerable#map()?
Here's a little example, for fun:
ruby-1.9.2-p180 :001 > {"herp" => "derp"}.map{|k,v| k+v}
=> ["herpderp"]
It doesn't override map
Hash.new.method(:map).owner # => Enumerable
It yields two variables which get collected into an array
class Nums
include Enumerable
def each
yield 1
yield 1, 2
yield 3, 4, 5
end
end
Nums.new.to_a # => [1, [1, 2], [3, 4, 5]]
Given that map() is defined by Enumerable, how can Hash#map yield two variables to its block?
It doesn't. It yields a single object to its block, which is a two-element array consisting of the key and the value.
It's just destructuring bind:
def without_destructuring(a, b) end
without_destructuring([1, 2])
# ArgumentError: wrong number of arguments (1 for 2)
def with_destructuring((a, b)) end # Note the extra parentheses
with_destructuring([1, 2])
def with_nested_destructuring((a, (b, c))) p a; p b; p c end
with_nested_destructuring([1, [2, 3]])
# 1
# 2
# 3
# Note the similarity to
a, (b, c) = [1, [2, 3]]
Theoretically, you would have to call map like this:
hsh.map {|(k, v)| ... }
And, in fact, for inject, you actually need to do that:
hsh.inject {|acc, (k, v)| ... }
However, Ruby is more lenient with argument checking for blocks than it is for methods. In particular:
If you yield more than one object, but the block only takes a single argument, all the objects are collected into an array.
If you yield a single object, but the block takes multiple arguments, Ruby performs destructuring bind. (This is the case here.)
If you yield more objects than the block takes arguments, the extra objects get ignored.
If you the block takes more arguments than you are yielding, the extra arguments are bound to nil.
Basically, the same semantics as parallel assignment.
In fact, before Ruby 1.9, block arguments actually did have assignment semantics. This allowed you to do crazy things like this:
class << (a = Object.new); attr_accessor :b end
def wtf; yield 1, 2 end
wtf {|#a, a.b| } # WTF? The block body is empty!
p #a
# 1
p a.b
# 2
This crazy stuff works (in 1.8 and older), because block argument passing is treated the same as assignment. IOW, even though the above block is empty and doesn't do anything, the fact that block arguments are passed as if they had been assigned, means that #a is set and the a.b= setter method is called. Crazy, huh? That's why it was removed in 1.9.
If you want to startle your co-workers, stop defining your setters like this:
attr_writer :foo
and instead define them like this:
define_method(:foo=) {|#foo|}
Just make sure someone else ends up maintaining it :-)

What is the advantage of creating an enumerable object using to_enum in Ruby?

Why would you create a proxy reference to an object in Ruby, by using the to_enum method rather than just using the object directly? I cannot think of any practical use for this, trying to understand this concept & where someone might use it, but all the examples I have seen seem very trivial.
For example, why use:
"hello".enum_for(:each_char).map {|c| c.succ }
instead of
"hello".each_char.map {|c| c.succ }
I know this is a very simple example, does anyone have any real-world examples?
Most built-in methods that accept a block will return an enumerator in case no block is provided (like String#each_char in your example). For these, there is no reason to use to_enum; both will have the same effect.
A few methods do not return an Enumerator, though. In those case you might need to use to_enum.
# How many elements are equal to their position in the array?
[4, 1, 2, 0].to_enum(:count).each_with_index{|elem, index| elem == index} #=> 2
As another example, Array#product, #uniq and #uniq! didn't use to accept a block. In 1.9.2, this was changed, but to maintain compatibility, the forms without a block can't return an Enumerator. One can still "manually" use to_enum to get an enumerator:
require 'backports/1.9.2/array/product' # or use Ruby 1.9.2+
# to avoid generating a huge intermediary array:
e = many_moves.to_enum(:product, many_responses)
e.any? do |move, response|
# some criteria
end
The main use of to_enum is when you are implementing your own iterative method. You typically will have as a first line:
def my_each
return to_enum :my_each unless block_given?
# ...
end
I think it has something to do with internal and external Iterators. When you return an enumerator like this:
p = "hello".enum_for(:each_char)
p is an external enumerator. One advantage of external iterators is that:
External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators…. But on the other hand, internal iterators are easier to use, because they define the iteration logic for you. [From The Ruby Programming Language book, ch. 5.3]
So, with external iterator you can do, e.g.:
p = "hello".enum_for(:each_char)
loop do
puts p.next
end
Let's say we want to take an array of keys and an array of values and sew them up in a Hash:
With #to_enum
def hashify(k, v)
keys = k.to_enum(:each)
values = v.to_enum(:each)
hash = []
loop do
hash[keys.next] = values.next
# No need to check for bounds,
# as #next will raise a StopIteration which breaks from the loop
end
hash
end
Without #to_enum:
def hashify(k, v)
hash = []
keys.each_with_index do |key, index|
break if index == values.length
hash[key] = values[index]
end
hash
end
It's much easier to read the first method, don't you think? Not a ton easier, but imagine if we were somehow manipulating items from 3 arrays? 5? 10?
This isn't quite an answer to your question, but hopefully it is relevant.
In your second example you are calling each_char without passing a block. When called without a block each_char returns an Enumerator so your examples are actually just two ways of doing the same thing. (i.e. both result in the creation of an enumerable object.)
irb(main):016:0> e1 = "hello".enum_for(:each_char)
=> #<Enumerator:0xe15ab8>
irb(main):017:0> e2 = "hello".each_char
=> #<Enumerator:0xe0bd38>
irb(main):018:0> e1.map { |c| c.succ }
=> ["i", "f", "m", "m", "p"]
irb(main):019:0> e2.map { |c| c.succ }
=> ["i", "f", "m", "m", "p"]
It's great for large or infinite generator objects.
E.g., the following will give you an enumerator for the whole Fibonacci seequence, from 0 to infinity.
def fib_sequence
return to_enum(:fib_sequence) unless block_given?
yield 0
yield 1
x,y, = 0, 1
loop { x,y = y,x+y; yield(y) }
end
to_enum effectively allows you to write this with regular yields without having to mess with Fibers.
You can then slice it as you want, and it will be very memory efficient, since no arrays will be stored in memory:
module Slice
def slice(range)
return to_enum(:slice, range) unless block_given?
start, finish = range.first, range.max + 1
copy = self.dup
start.times { copy.next }
(finish-start).times { yield copy.next }
end
end
class Enumerator
include Slice
end
fib_sequence.slice(0..10).to_a
#=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
fib_sequence.slice(10..20).to_a
#=> [55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

Are there something like Python generators in Ruby?

I am new to Ruby, is there a way to yield values from Ruby functions? If yes, how? If not, what are my options to write lazy code?
Ruby's yield keyword is something very different from the Python keyword with the same name, so don't be confused by it. Ruby's yield keyword is syntactic sugar for calling a block associated with a method.
The closest equivalent is Ruby's Enumerator class. For example, the equivalent of the Python:
def eternal_sequence():
i = 0
while True:
yield i
i += 1
is this:
def eternal_sequence
Enumerator.new do |enum|
i = 0
while true
enum.yield i # <- Notice that this is the yield method of the enumerator, not the yield keyword
i +=1
end
end
end
You can also create Enumerators for existing enumeration methods with enum_for. For example, ('a'..'z').enum_for(:each_with_index) gives you an enumerator of the lowercase letters along with their place in the alphabet. You get this for free with the standard Enumerable methods like each_with_index in 1.9, so you can just write ('a'..'z').each_with_index to get the enumerator.
I've seen Fibers used in that way, look at an example from this article:
fib = Fiber.new do
x, y = 0, 1
loop do
Fiber.yield y
x,y = y,x+y
end
end
20.times { puts fib.resume }
If you are looking to lazily generate values, #Chuck's answer is the correct one.
If you are looking to lazily iterate over a collection, Ruby 2.0 introduced the new .lazy enumerator.
range = 1..Float::INFINITY
puts range.map { |x| x+1 }.first(10) # infinite loop
puts range.lazy.map { |x| x+1 }.first(10) # [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Ruby supports generators out of the box using Enumerable::Generator:
require 'generator'
# Generator from an Enumerable object
g = Generator.new(['A', 'B', 'C', 'Z'])
while g.next?
puts g.next
end
# Generator from a block
g = Generator.new { |g|
for i in 'A'..'C'
g.yield i
end
g.yield 'Z'
}
# The same result as above
while g.next?
puts g.next
end
https://ruby-doc.org/stdlib-1.8.7/libdoc/generator/rdoc/Generator.html
Class Enumerator and its method next behave similar
https://docs.ruby-lang.org/en/3.1/Enumerator.html#method-i-next
range = 1..Float::INFINITY
enumerator = range.each
puts enumerator.class # => Enumerator
puts enumerator.next # => 1
puts enumerator.next # => 2
puts enumerator.next # => 3

Resources