How does the enumerator receive a block? - ruby

I have:
letters = %w(e d c b a)
In the following:
letters.group_by.each_with_index { |item, index| index % 3 }
#=> {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
how can the enumerator returned by group_by know the block it will execute? Is the block received by each_with_index passed to the enumerator which it is based on?
In the following:
letters.each_with_index.group_by { |item, index| index % 3 }
#=> {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
will the block be passed to the enumerator returned by each_with_index? If it will, how does each_with_index execute it?
In general:
How is a block retrieved by a method in an enumerator that doesn't directly receive the block?
Will a block be passed through an enumerator chain? Where will it be executed?

There's some tricky stuff going on here, so that's probably why you're a little hazy on how it works. Enumerators are one of the most important things in Ruby, they're the backbone of the Enumerable system which is where Ruby really shines, but they're often used in ways where they're transparent, living in the shadows, so you rarely have to pay direct attention to them.
Looking more closely, step through this bit by bit:
letters.group_by
# => #<Enumerator: ["e", "d", "c", "b", "a"]:group_by>
Now this is an Enumerator instance. The each_with_index you chain on the end is actually an Enumerator-specific method:
letters.group_by.method(:each_with_index)
# => #<Method: Enumerator#each_with_index>
This is in contrast to your second approach:
letters.method(:each_with_index)
# => #<Method: Array(Enumerable)#each_with_index>
That one is Array's method which, conveniently, you can chain into a method like group_by.
So the story here is that group_by in chain mode actually provides special methods that have the effect of back-propagating your block to the group_by level.

Related

How to understand the work flow in ruby enumerator chain

The code below produces two different results.
letters = %w(e d c b a)
letters.group_by.each_with_index { |item, index| index % 3 }
#=> {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
letters.each_with_index.group_by { |item, index| index % 3 }
#=> {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
I think the execution flow is from right to left, and the data flow is from the left to right. The block should be passed as parameter from right to left.
Using puts, I observed that the block is executed in the inner each.
In the first chain, group_by should ask each for data, each will return the result of index%3, and group_by should process the result and yield it to another block. But how is the block passed? If the block is executed in each, each would not pass two parameters item and index but only one parameter item.
In the second chain, in my understanding, each_with_index will receive the data from each method first; each yields to index%3. In that case, how can each_with_index process index%3?
It seems my understanding is somehow wrong. Can anyone illustrate theses two examples with details and give the general work flow in such cases?
Proxy objects
Both execution and data flows are from left to right, as with any method call in Ruby.
Conceptually, it can help to read Enumerators call chains from right to left, though, because they're a kind of a proxy object.
Called without block, they just remember in which order which method has been called. The method is then only really called when it's needed, for example when the Enumerator is converted back to an Array or the elements are printed on screen.
If no such method is called at the end of the chain, basically nothing happens:
[1,2,3].each_with_index.each_with_index.each_with_index.each_with_index
# #<Enumerator: ...>
[1,2,3].each_with_index.each_with_index.each_with_index.each_with_index.to_a
# [[[[[1, 0], 0], 0], 0], [[[[2, 1], 1], 1], 1], [[[[3, 2], 2], 2], 2]]
This behaviour makes it possible to work with very large streams of objects, without needing to pass huge arrays between method calls. If the output isn't needed, nothing is calculated. If 3 elements are needed at the end, only 3 elements are calculated.
The proxy pattern is heavily used in Rails, for example with ActiveRecord::Relation :
#person = Person.where(name: "Jason").where(age: 26)
It would be inefficient to launch 2 DB queries in this case. You can only know that at the end of the chained methods, though. Here's a related answer (How does Rails ActiveRecord chain “where” clauses without multiple queries?)
MyEnumerator
Here's a quick and dirty MyEnumerator class. It might help you understand the logic for the method calls in your question:
class MyEnumerator < Array
def initialize(*p)
#methods = []
#blocks = []
super
end
def group_by(&b)
save_method_and_block(__method__, &b)
self
end
def each_with_index(&b)
save_method_and_block(__method__, &b)
self
end
def to_s
"MyEnumerable object #{inspect} with methods : #{#methods} and #{#blocks}"
end
def apply
result = to_a
puts "Starting with #{result}"
#methods.zip(#blocks).each do |m, b|
if b
puts "Apply method #{m} with block #{b} to #{result}"
else
puts "Apply method #{m} without block to #{result}"
end
result = result.send(m, &b)
end
result
end
private
def save_method_and_block(method, &b)
#methods << method
#blocks << b
end
end
letters = %w[e d c b a]
puts MyEnumerator.new(letters).group_by.each_with_index { |_, i| i % 3 }.to_s
# MyEnumerable object ["e", "d", "c", "b", "a"] with methods : [:group_by, :each_with_index] and [nil, #<Proc:0x00000001da2518#my_enumerator.rb:35>]
puts MyEnumerator.new(letters).group_by.each_with_index { |_, i| i % 3 }.apply
# Starting with ["e", "d", "c", "b", "a"]
# Apply method group_by without block to ["e", "d", "c", "b", "a"]
# Apply method each_with_index with block #<Proc:0x00000000e2cb38#my_enumerator.rb:42> to #<Enumerator:0x00000000e2c610>
# {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
puts MyEnumerator.new(letters).each_with_index.group_by { |_item, index| index % 3 }.to_s
# MyEnumerable object ["e", "d", "c", "b", "a"] with methods : [:each_with_index, :group_by] and [nil, #<Proc:0x0000000266c220#my_enumerator.rb:48>]
puts MyEnumerator.new(letters).each_with_index.group_by { |_item, index| index % 3 }.apply
# Apply method each_with_index without block to ["e", "d", "c", "b", "a"]
# Apply method group_by with block #<Proc:0x0000000266bd70#my_enumerator.rb:50> to #<Enumerator:0x0000000266b938>
# {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}

Ruby enumerable reset?

I'm having trouble understanding exactly how much state a ruby enumerable keeps.
I know some python, so I was expecting that after I take an item from an enumerable, it's gone and the next item will be returned as I take another item.
Strangely, this does happen when I use next but not when I use anything like take of first.
Here's an example:
a = [1,2,3].to_enum
# => #<Enumerator: [1, 2, 3]:each>
a.take(2)
# => [1, 2]
a.next
# => 1
a.next
# => 2
a.take(2)
# => [1, 2]
a.next
# => 3
a.next
# StopIteration: iteration reached an end
# from (irb):58:in `next'
# from (irb):58
# from /usr/bin/irb:12:in `<main>'
a.take(2)
# => [1, 2]
It seems like the enumerable keeps state between next calls, but resets before each take call?
It may be a little confusing, but it's important to note that in Ruby there is the Enumerator class and the Enumerable module.
The Enumerator class includes Enumerable (like most of enumerable objects such as Array, Hash, etc.
The next method is provided as part of the Enumerator, which indeed has an internal state. You can consider an Enumerator very close to the concept of Iterator exposed by other languages.
When you instantiate the Enumerator, the internal pointer points to the first item in the collection.
2.1.5 :021 > a = [1,2,3].to_enum
=> #<Enumerator: [1, 2, 3]:each>
2.1.5 :022 > a.next
=> 1
2.1.5 :023 > a.next
=> 2
This is not the only purpose of the Enumerator (otherwise it would probably be called Iterator). However, this is one of the documented feature.
An Enumerator can also be used as an external iterator. For example, #next returns the next value of the iterator or raises StopIteration if the Enumerator is at the end.
e = [1,2,3].each # returns an enumerator object.
puts e.next # => 1
puts e.next # => 2
puts e.next # => 3
puts e.next # raises StopIteration
But as I said before, the Enumerator class includes Enumerable. It means every instance of an Enumerator exposes the Enumerable methods that are designed to work on a collection. In this case, the collection is the one the Enumerator is wrapped on.
take is a generic Enumerable method. It is designed to return the first N elements from enum. It's important to note that enum is referring to any generic class that includes Enumerable, not to the Enumerator. Therefore, take(2) will returns the first two elements of the collection, regardless the position of the pointer inside the Enumerator instance.
Let me show you a practical example. I can create a custom class, and implement Enumerable.
class Example
include Enumerable
def initialize(array)
#array = array
end
def each(*args, &block)
#array.each(*args, &block)
end
end
I can mix Enumerable, and as long as I provide an implementation for each I get all the other methods for free, including take.
e = Example.new([1, 2, 3])
=> #<Example:0x007fa9529be760 #array=[1, 2, 3]>
e.take(2)
=> [1, 2]
As expected, take returns the first 2 elements. take ignores anything else of my implementation, exactly as in Enumerable, including states or pointers.
Per the documentation, Enumerable#take returns first n elements from the Enumerator, not the next n elements from the cursor. Only the methods from Enumerator are going to operate on that internal cursor; the Enumerable mix-in is just a collection of methods for enumerating which don't necessarily share cursors.
If you wanted, you could implement Enumerator#take to do what you expect:
class Enumerator
def take(n = 1)
n.times.map { self.next }
end
end

Differences between these 2 Ruby enumerators: [1,2,3].map vs. [1,2,3].group_by

In Ruby, is there a functional difference between these two Enumerators?
irb> enum_map = [1,2,3].map
=> #<Enumerator: [1, 2, 3]:map> # ends with "map>"
irb> enum_group_by = [1,2,3].group_by
=> #<Enumerator: [1, 2, 3]:group_by> # ends with "group_by>"
irb> enum_map.methods == enum_group_by.methods
=> true # they have the same methods
What can #<Enumerator: [1, 2, 3]:map> do that <Enumerator: [1, 2, 3]:group_by> can't do, and vice versa?
Thanks!
From the documentation of group_by:
Groups the collection by result of the block. Returns a hash where the
keys are the evaluated result from the block and the values are arrays
of elements in the collection that correspond to the key.
If no block is given an enumerator is returned.
(1..6).group_by { |i| i%3 } #=> {0=>[3, 6], 1=>[1, 4], 2=>[2, 5]}
From the documentation of map:
Returns a new array with the results of running block once for every
element in enum.
If no block is given, an enumerator is returned instead.
(1..4).map { |i| i*i } #=> [1, 4, 9, 16]
(1..4).collect { "cat" } #=> ["cat", "cat", "cat", "cat"]
As you can see, each does something different, which serves a different purpose. Concluding that two APIs are the same because they expose the same interface seems to miss the entire purpose of Object Oriented Programming - different services are supposed to expose the same interface to enable polymorphism.
There's a difference in what they do, but fundamentally they are both of the same class: Enumerator.
When they're used the values emitted by the enumerator will be different, yet the interface to them is identical.
Two objects of the same class generally have the same methods. It is possible to augment an instance with additional methods, but this is not normally done.

How to #rewind the internal position under #each?

I'm trying to write a code where the enumeration sequence is rewinded to the beginning.
I think rewind is appropriate for this application, but I'm not sure how to implement it under an each iterator passing to a block? In the Ruby-Docs example, next is used to move the internal position by one at a time. With a block, it would move autonomously.
There's not many good examples online for this specifically. My workaround at the moment is to nest an iterator under a loop and using break under the iterator. When the iterator breaks, the loop resets the enumeration sequence.
Is there a better way—as I'm sure there is—of doing this?
Use the Enumerator#rewind method from Ruby core class libarary.
Rewinds the enumeration sequence to the beginning.If the enclosed object responds to a “rewind” method, it is called.
a = [1,2,3,4]
enum= a.each
enum # => #<Enumerator: [1, 2, 3, 4]:each>
enum.next # => 1
enum.next # => 2
enum.rewind # => #<Enumerator: [1, 2, 3, 4]:each>
enum.next # => 1

Ruby Hash/Array delete_if without a block

According to Ruby Hash/Array documentation, the delete_if method returns an enumerator if no block is given. How is this useful? Can someone give an example to demonstrate this pattern?
There are some methods defined on Enumerator that give flexibility to iterators. One such method I often use is with_index.
p %w[a b c d e f].delete_if.with_index{|_, i| i.even?}
# => ["b", "d", "f"]
If this was to be done without Enumerator class, all kinds of methods have to be defined, including delete_if_with_index, and that is not a good thing.
The enumerator will just allow you to run the block later. For example, if you had a method that specifically handled the delete if for several different objects, you could pass it the enumerator.
In the example below, it will print 1, 3, 5
arr = [0,1,2,3,4,5]
enumerator = arr.delete_if
enumerator.each { |el| el.even? }
puts arr.join(', ')

Resources