How to understand the work flow in ruby enumerator chain - ruby

The code below produces two different results.
letters = %w(e d c b a)
letters.group_by.each_with_index { |item, index| index % 3 }
#=> {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
letters.each_with_index.group_by { |item, index| index % 3 }
#=> {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
I think the execution flow is from right to left, and the data flow is from the left to right. The block should be passed as parameter from right to left.
Using puts, I observed that the block is executed in the inner each.
In the first chain, group_by should ask each for data, each will return the result of index%3, and group_by should process the result and yield it to another block. But how is the block passed? If the block is executed in each, each would not pass two parameters item and index but only one parameter item.
In the second chain, in my understanding, each_with_index will receive the data from each method first; each yields to index%3. In that case, how can each_with_index process index%3?
It seems my understanding is somehow wrong. Can anyone illustrate theses two examples with details and give the general work flow in such cases?

Proxy objects
Both execution and data flows are from left to right, as with any method call in Ruby.
Conceptually, it can help to read Enumerators call chains from right to left, though, because they're a kind of a proxy object.
Called without block, they just remember in which order which method has been called. The method is then only really called when it's needed, for example when the Enumerator is converted back to an Array or the elements are printed on screen.
If no such method is called at the end of the chain, basically nothing happens:
[1,2,3].each_with_index.each_with_index.each_with_index.each_with_index
# #<Enumerator: ...>
[1,2,3].each_with_index.each_with_index.each_with_index.each_with_index.to_a
# [[[[[1, 0], 0], 0], 0], [[[[2, 1], 1], 1], 1], [[[[3, 2], 2], 2], 2]]
This behaviour makes it possible to work with very large streams of objects, without needing to pass huge arrays between method calls. If the output isn't needed, nothing is calculated. If 3 elements are needed at the end, only 3 elements are calculated.
The proxy pattern is heavily used in Rails, for example with ActiveRecord::Relation :
#person = Person.where(name: "Jason").where(age: 26)
It would be inefficient to launch 2 DB queries in this case. You can only know that at the end of the chained methods, though. Here's a related answer (How does Rails ActiveRecord chain “where” clauses without multiple queries?)
MyEnumerator
Here's a quick and dirty MyEnumerator class. It might help you understand the logic for the method calls in your question:
class MyEnumerator < Array
def initialize(*p)
#methods = []
#blocks = []
super
end
def group_by(&b)
save_method_and_block(__method__, &b)
self
end
def each_with_index(&b)
save_method_and_block(__method__, &b)
self
end
def to_s
"MyEnumerable object #{inspect} with methods : #{#methods} and #{#blocks}"
end
def apply
result = to_a
puts "Starting with #{result}"
#methods.zip(#blocks).each do |m, b|
if b
puts "Apply method #{m} with block #{b} to #{result}"
else
puts "Apply method #{m} without block to #{result}"
end
result = result.send(m, &b)
end
result
end
private
def save_method_and_block(method, &b)
#methods << method
#blocks << b
end
end
letters = %w[e d c b a]
puts MyEnumerator.new(letters).group_by.each_with_index { |_, i| i % 3 }.to_s
# MyEnumerable object ["e", "d", "c", "b", "a"] with methods : [:group_by, :each_with_index] and [nil, #<Proc:0x00000001da2518#my_enumerator.rb:35>]
puts MyEnumerator.new(letters).group_by.each_with_index { |_, i| i % 3 }.apply
# Starting with ["e", "d", "c", "b", "a"]
# Apply method group_by without block to ["e", "d", "c", "b", "a"]
# Apply method each_with_index with block #<Proc:0x00000000e2cb38#my_enumerator.rb:42> to #<Enumerator:0x00000000e2c610>
# {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
puts MyEnumerator.new(letters).each_with_index.group_by { |_item, index| index % 3 }.to_s
# MyEnumerable object ["e", "d", "c", "b", "a"] with methods : [:each_with_index, :group_by] and [nil, #<Proc:0x0000000266c220#my_enumerator.rb:48>]
puts MyEnumerator.new(letters).each_with_index.group_by { |_item, index| index % 3 }.apply
# Apply method each_with_index without block to ["e", "d", "c", "b", "a"]
# Apply method group_by with block #<Proc:0x0000000266bd70#my_enumerator.rb:50> to #<Enumerator:0x0000000266b938>
# {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}

Related

Ruby #to_enum: what's the best way to extract the original object from the enumerator?

Suppose I have an object:
obj = Object.new #<Object:0x00007fbe36b4db28>
And I convert it to an Enumerator:
obj_enum = obj.to_enum #<Enumerator: #<Object:0x00007fbe36b4db28>:each>
Now I want to get my object back from the enumerator. I found a way to do it, but it seems unnecessarily abstruse (not to mention pretty fragile):
extracted_obj = ObjectSpace._id2ref(
obj_enum.inspect.match(/0x[0-9a-f]*/).values_at(0)[0].to_i(16)/2
)
p obj.equal? extracted_obj # => true
In case it isn't clear, I'm inspecting the Enumerator object, using regex to pull the original object's id from the resulting string, converting it to an integer (and dividing by 2), and using ObjectSpace._id2ref to convert the id to a reference to my object. Ugly stuff.
I have trouble believing that this is the easiest way to get this job done, but some hours of googling haven't revealed anything to me. Is there a simple way to extract an object after wrapping an Enumerator around it with #to_enum, or is this pretty much the way to do it?
Edit:
As Amadan says below (and much appreciated, Amadan), this may be an XY problem, and I may have to rethink my solution. I'll explain a bit about how I got here.
The (meta) use case: I have a (variable) quantity of objects in an array. The objects each expose an array of integers (all of the same size) as an attribute, sorted high to low. I want to iterate the arrays of each of the objects simultaneously, finding the object or objects with the highest integer not matched in another object's array.
It seemed like external iteration was a good way to go about doing this, since simultaneous internal iteration of multiple objects that have to know about the intermediate results of one another's iterations gets out there pretty quickly as well. But when I have found the enumerator that contains the object with the array with the highest value, I need to return the object that it wraps.
There very well may be a better way to go than using enumerators when this is a requirement. However, the actual iteration and selection process is pretty trivial when using them.
So. The applied use case: a quantity of poker hands with no hand better than a high card. Find the winning hand. The "winning hand" is the hand with the highest card not matched in rank by another hand. (Suits are irrelevant.) If all the cards match in two or more hands, return those hands in an array.
"Minimal reproducible example":
class Hand
attr_reader :cards
def initialize(cards)
#cards = cards.sort.reverse
end
def each
#cards.each { |card| yield(card.first) }
end
end
class Poker
def initialize(hands)
#hands = hands.map { |hand| Hand.new(hand) }
end
def high_cards
hand_enums = #hands.map(&:to_enum)
loop do
max_rank = hand_enums.map(&:peek).max
hand_enums.delete_if { |enum| enum.peek != max_rank }
hand_enums.each(&:next)
end
hand_enums.map { |e| from_enum(e).cards }
end
def from_enum(enum)
ObjectSpace._id2ref(
enum.inspect.match(/0x[0-9a-f]*/).values_at(0)[0].to_i(16) / 2
)
end
end
hands = [
[[10, "D"], [3, "C"], [8, "C"], [7, "C"], [9, "D"]],
[[10, "D"], [8, "S"], [7, "S"], [9, "H"], [2, "H"]],
[[9, "C"], [8, "H"], [9, "S"], [4, "C"], [7, "D"]]
]
game = Poker.new(hands)
p game.high_cards # => [[[10, "D"], [9, "D"], [8, "C"], [7, "C"], [3, "C"]]]
This "works," but I certainly agree with Amadan that it's a hack. An interesting and instructive one, maybe, but a hack all the same. TIA for any suggestions.
I am not sure why all this talk of enums.
I'll presume hands are in a format similar to this:
hands = [Hand.new([[12, "H"], [10, "H"], [8, "D"], [3, "D"], [2, "C"]]),
Hand.new([[10, "D"], [9, "H"], [3, "C"], [2, "D"], [2, "H"]]),
Hand.new([[12, "D"], [10, "S"], [8, "C"], [3, "S"], [2, "H"]]),
Hand.new([[12, "C"], [9, "S"], [8, "C"], [8, "S"], [8, "S"]])]
and you want to get the 0th and 2nd element. I'll also assume the hands are sorted, per your assert. As far as I can see, this is all that is needed, since arrays compare lexicographically:
max_ranks = hands.map { |hand| hand.cards.map(&:first) }.max
max_hands = hands.select { |hand| hand.cards.map(&:first) == max_ranks }
Alternately, use group_by (a bit better, as it doesn't need to calculate ranks twice):
hands_by_ranks = hands.group_by { |hand| hand.cards.map(&:first) }
max_hands = hands_by_ranks[hands_by_ranks.keys.max]
As already mentioned in the comments, there is no way to reliably retrieve the underlying object, because the enumerator doesn't always have one.
While your solution will work for this specific case, I would suggest coming up with another approach that will not depend on the implementation details of the enumerator object.
One of the possible solutions would be to pass an instance of Hand along with the enumerator.
class Poker
def high_cards(hands)
hand_enums = hands.map { |hand| [hand, hand.to_enum] }
loop do
max_rank = hand_enums.map(&:last).map(&:peek).max
hand_enums.delete_if {|enum| enum.last.peek != max_rank }
hand_enums.each {|pair| pair.last.next}
end
hand_enums.map(&:first)
end
end
Another more object-oriented approach is to introduce a custom Enumerator and expose the underlying object in a more explicit way:
class HandEnumerator < Enumerator
attr_reader :hand
def initialize(hand, &block)
#hand = hand
super(block)
end
end
class Hand
def to_enum
HandEnumerator.new(self) { |yielder| #cards.each { |card| yielder << card.first }}
end
# To satisfy interface of enumerator creation
def enum_for
to_enum
end
end
class Poker
def high_cards(hands)
hand_enums = hands.map(&:to_enum)
loop do
max_rank = hand_enums.map(&:peek).max
hand_enums.delete_if {|enum| enum.peek != max_rank }
hand_enums.each(&:next)
end
hand_enums.map(&:hand)
end
end

How does the enumerator receive a block?

I have:
letters = %w(e d c b a)
In the following:
letters.group_by.each_with_index { |item, index| index % 3 }
#=> {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
how can the enumerator returned by group_by know the block it will execute? Is the block received by each_with_index passed to the enumerator which it is based on?
In the following:
letters.each_with_index.group_by { |item, index| index % 3 }
#=> {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
will the block be passed to the enumerator returned by each_with_index? If it will, how does each_with_index execute it?
In general:
How is a block retrieved by a method in an enumerator that doesn't directly receive the block?
Will a block be passed through an enumerator chain? Where will it be executed?
There's some tricky stuff going on here, so that's probably why you're a little hazy on how it works. Enumerators are one of the most important things in Ruby, they're the backbone of the Enumerable system which is where Ruby really shines, but they're often used in ways where they're transparent, living in the shadows, so you rarely have to pay direct attention to them.
Looking more closely, step through this bit by bit:
letters.group_by
# => #<Enumerator: ["e", "d", "c", "b", "a"]:group_by>
Now this is an Enumerator instance. The each_with_index you chain on the end is actually an Enumerator-specific method:
letters.group_by.method(:each_with_index)
# => #<Method: Enumerator#each_with_index>
This is in contrast to your second approach:
letters.method(:each_with_index)
# => #<Method: Array(Enumerable)#each_with_index>
That one is Array's method which, conveniently, you can chain into a method like group_by.
So the story here is that group_by in chain mode actually provides special methods that have the effect of back-propagating your block to the group_by level.

How do the Array methods below differ from each other in Ruby?

I am getting confused with the Array methods below. Can anyone help me understand how differently they work from each other with the help of simple snippet?
array.sort and array.sort { | a,b | block }
array.to_a and array.to_ary
array.size and array.length
array.reverse and array.reverse_each {|item| block }
array.fill(start [, length] ) { |index| block } and
array.fill(range) { |index| block }
Please read the documentation for Array.
sort:
a=[3,1,2]
a.sort # => [1, 2, 3]
a.sort{|a,b| b<=>a} # => [3, 2, 1]
use the second one if you need some custom way to sort elements.
to_a vs. to_ary:
class Foo < Array;end
b=Foo[1,2]
b.to_ary.class # returns self
b.to_a.class # converts to array
size and length are exactly the same.
reverse_each is pretty much the same as reverse.each.
If you want to fill only a part of the array, you can call Array.fill either with a range or start,length. Those are just different ways to achieve the same:
(["a"]*10).fill("b",2..7)
(["a"]*10).fill("b",2,6)
both return ["a", "a", "b", "b", "b", "b", "b", "b", "a", "a"].

Delete contents of array based on a set of indexes

delete_at only takes a single index. What's a good way to achieve this using built-in methods?
Doesn't have to be a set, can be an array of indexes as well.
arr = ["a", "b", "c"]
set = Set.new [1, 2]
arr.delete_at set
# => arr = ["a"]
One-liner:
arr.delete_if.with_index { |_, index| set.include? index }
Re-open the Array class and add a new method for this.
class Array
def delete_at_multi(arr)
arr = arr.sort.reverse # delete highest indexes first.
arr.each do |i|
self.delete_at i
end
self
end
end
arr = ["a", "b", "c"]
set = [1, 2]
arr.delete_at_multi(set)
arr # => ["a"]
This could of course be written as a stand-alone method if you don't want to re-open the class. Making sure the indexes are in reverse order is very important, otherwise you change the position of elements later in the array that are supposed to be deleted.
Try this:
arr.reject { |item| set.include? arr.index(item) } # => [a]
It's a bit ugly, I think ;) Maybe someone suggest a better solution?
Functional approach:
class Array
def except_values_at(*indexes)
([-1] + indexes + [self.size]).sort.each_cons(2).flat_map do |idx1, idx2|
self[idx1+1...idx2] || []
end
end
end
>> ["a", "b", "c", "d", "e"].except_values_at(1, 3)
=> ["a", "c", "e"]

Eliminate consecutive duplicates of list elements

What is the best solution to eliminate consecutive duplicates of list elements?
list = compress(['a','a','a','a','b','c','c','a','a','d','e','e','e','e']).
p list # => # ['a','b','c','a','d','e']
I have this one:
def compress(list)
list.map.with_index do |element, index|
element unless element.equal? list[index+1]
end.compact
end
Ruby 1.9.2
Nice opportunity to use Enumerable#chunk, as long as your list doesn't contain nil:
list.chunk(&:itself).map(&:first)
For Ruby older than 2.2.x, you can require "backports/2.2.0/kernel/itself" or use {|x| x} instead of (&:itself).
For Ruby older than 1.9.2, you can require "backports/1.9.2/enumerable/chunk" to get a pure Ruby version of it.
Do this (provided that each element is a single character)
list.join.squeeze.split('')
Ruby 1.9+
list.select.with_index{|e,i| e != list[i+1]}
with respect to #sawa, who told me about with_index :)
As #Marc-André Lafortune noticed if there is nil at the end of your list it won't work for you. We can fix it with this ugly structure
list.select.with_index{|e,i| i < (list.size-1) and e != list[i+1]}
# Requires Ruby 1.8.7+ due to Object#tap
def compress(items)
last = nil
[].tap do |result|
items.each{ |o| result << o unless last==o; last=o }
end
end
list = compress(%w[ a a a a b c c a a d e e e e ])
p list
#=> ["a", "b", "c", "a", "d", "e"]
arr = ['a','a','a','a','b','c','c','a','a','d','e','e','e','e']
enum = arr.each
#=> #<Enumerator: ["a", "a", "a", "a", "b", "c", "c", "a", "a", "d",
# "e", "e", "e", "e"]:each>
a = []
loop do
n = enum.next
a << n unless n == enum.peek
end
a #=> ["a", "b", "c", "a", "d"]
Enumerator#peek raises a StopIteration exception when it has already returned the last element of the enumerator. Kernel#loop handles that exception by breaking out of the loop.
See Array#each and Enumerator#next. Kernel#to_enum1 can be used in place of Array#each.
1 to_enum is an Object instance method that is defined in the Kernel module but documented in the Object class. Got that?

Resources