How does Ruby Enumerators chaining work exactly?

How does Ruby Enumerators chaining work exactly? - ruby

Consider the following code:
[1,2,3].map.with_index { |x, i| x * i }
# => [0,2,6]
How does this work exactly?
My mental model of map is that it iterates and apply a function on each element. Is with_index somehow passing a function to the enumerator [1,2,3].map, in which case what would that function be?
This SO thread shows how enumerators pass data through, but doesn't answer the question. Indeed, if you replace map with each then the behaviour is different:
[1,2,3].each.with_index { |x, i| x * i }
# => [1,2,3]
map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?

Todd's answer is excellent, but I feel like seeing some more Ruby code might be beneficial. Specifically, let's try to write each and map on Array ourselves.
I won't use any Enumerable or Enumerator methods directly, so we see how it's all working under the hood (I'll still use for loops, and those technically call #each under the hood, but that's only cheating a little)
First, there's each. each is easy. It iterates over the array and applies a function to each element, before returning the original array.
def my_each(arr, &block)
for i in 0..arr.length-1
block[arr[i]]
end
arr
end
Simple enough. Now what if we don't pass a block. Let's change it up a bit to support that. We effectively want to delay the act of doing the each to allow the Enumerator to do its thing
def my_each(arr, &block)
if block
for i in 0..arr.length-1
block[arr[i]]
end
arr
else
Enumerator.new do |y|
my_each(arr) { |*x| y.yield(*x) }
end
end
end
So if we don't pass a block, we make an Enumerator that, when consumed, calls my_each, using the enumerator yield object as a block. The y object is a funny thing but you can just think of it as basically being the block you'll eventually pass in. So, in
my_each([1, 2, 3]).with_index { |x, i| x * i }
Think of y as being like the { |x, i| x * i } bit. It's a bit more complicated than that, but that's the idea.
Incidentally, on Ruby 2.7 and later, the Enumerator::Yielder object got its own #to_proc, so if you're on a recent Ruby version, you can just do
Enumerator.new do |y|
my_each(arr, &y)
end
rather than
Enumerator.new do |y|
my_each(arr) { |*x| y.yield(*x) }
end
Now let's extend this approach to map. Writing map with a block is easy. It's just like each but we accumulate the results.
def my_map(arr, &block)
result = []
for i in 0..arr.length-1
result << block[arr[i]]
end
result
end
Simple enough. Now what if we don't pass a block? Let's do the exact same thing we did for my_each. That is, we're just going to make an Enumerator and, inside that Enumerator, we call my_map.
def my_map(arr, &block)
if block
result = []
for i in 0..arr.length-1
result << block[arr[i]]
end
result
else
Enumerator.new do |y|
my_map(arr) { |*x| y.yield(*x) }
end
end
end
Now, the Enumerator knows that, whenever it eventually gets a block, it's going to use my_map on that block at the end. We can see that these two functions actually behave, on arrays, like map and each do
my_each([1, 2, 3]).with_index { |x, i| x * i } # [1, 2, 3]
my_map ([1, 2, 3]).with_index { |x, i| x * i } # [0, 2, 6]
So your intuition was spot on
map seems to carry the information that a function has to be applied, on top of carrying the data to iterate over. How does that work?
That's exactly what it does. map creates an Enumerator whose block knows to call map at the end, whereas each does the same but with each. Of course, in reality, all of this is implemented in C for efficiency and bootstrapping reasons, but the fundamental idea is still there.

Using Array#map without a block simply returns an Enumerator, where each element is then fed to Enumerator#with_index and the results of the block are returned as a collection. It's not complicated, and is similar to (but perhaps cleaner than) the following code. Using Ruby 3.0.1:
results = []
[1, 2, 3].each_with_index { results << _1 * _2 }
results
#=> [0, 2, 6]
Using Array#each doesn't return a collection from the block. It just returns self or another enumerator, so the expected behavior is different by design.

Related

Using range.each vs while-loop to work with sequence of numbers in Ruby

Total beginner here, so I apologize if a) this question isn't appropriate or b) I haven't asked it properly.
I'm working on simple practice problems in Ruby and I noticed that while I arrived at a solution that works, when my solution runs in a visualizer, it gives premature returns for the array. Is this problematic? I'm also wondering if there's any reason (stylistically, conceptually, etc.) why you would want to use a while-loop vs. a for-loop with range for a problem like this or fizzbuzz.
Thank you for any help/advice!
The practice problem is:
# Write a method which collects all numbers between small_num and big_num into
an array. Ex: range(2, 5) => [2, 3, 4, 5]
My solution:
def range(small_num, big_num)
arr = []
(small_num..big_num).each do |num|
arr.push(num)
end
return arr
end
The provided solution:
def range(small_num, big_num)
collection = []
i = small_num
while i <= big_num
collection << i
i += 1
end
collection
end

Here's a simplified version of your code:
def range(small_num, big_num)
arr = [ ]
(small_num..big_num).each do |num|
arr << num
end
arr
end
Where the << or push function does technically have a return value, and that return value is the modified array. This is just how Ruby works. Every method must return something even if that something is "nothing" in the form of nil. As with everything in Ruby even nil is an object.
You're not obligated to use the return values, though if you did want to you could. Here's a version with inject:
def range(small_num, big_num)
(small_num..big_num).inject([ ]) do |arr, num|
arr << num
end
end
Where the inject method takes the return value of each block and feeds it in as the "seed" for the next round. As << returns the array this makes it very convenient to chain.
The most minimal version is, of course:
def range(small_num, big_num)
(small_num..big_num).to_a
end
Or as Sagar points out, using the splat operator:
def range(small_num, big_num)
[*small_num..big_num]
end
Where when you splat something you're in effect flattening those values into the array instead of storing them in a sub-array.

Ruby's factor method explanation

class Integer
def factors
1.upto(Math.sqrt(self)).select {|i| (self % i).zero?}.inject([]) do |f, i|
f << self/i unless i == self/i
f << i
end.sort
end
end
[45, 53, 64].each {|n| puts "#{n} : #{n.factors}"}
In the above Ruby's code. It is a method to find all factors of an integer. There are several places where I am not too sure about the syntax of.
inject([]) - I have seen inject(:+) and inject(:*) where it is to add / multiply back into the result. I wonder if this is to push this into the existing array? What's the best way to explain this method?
after the inject([]) there is a do |f,i| block. I am not too sure if the i is different from the i declared outside the block. I assume not? and I am not too sure about what is block is trying to achieve.
end.sort, Haven't seen it before.
Would be grateful to have advice on this block of code! Thanks!

my_array.inject(&:+) is the same as my_array.inject(0) do |a, b| a + b end
The first a will be 0, and the first b will be the first element of my_array
The inside i is set by the inject method and is not the same identifier as the external i. inject will do something like yield(current_value, self[current_index]) and these will be your |f,i|
In end.sort, end is the end of the inject method call, so it returns a sorted list from the list inject returns.

Ruby yield example explanation?

I'm doing a SaaS course with Ruby. On an exercise, I'm asked to calculate the cartesian product of two sequences by using iterators, blocks and yield.
I ended up with this, by pure guess-and-error, and it seems to work. But I'm not sure about how. I seem to understand the basic blocks and yield usage, but this? Not at all.
class CartProd
include Enumerable
def initialize(a,b)
#a = a
#b = b
end
def each
#a.each{|ae|
#b.each{|be|
yield [ae,be]
}
}
end
end
Some explanation for a noob like me, please?
(PS: I changed the required class name to CartProd so people doing the course can't find the response by googling it so easily)

Let's build this up step-by-step. We will simplify things a bit by taking it out of the class context.
For this example it is intuitive to think of an iterator as being a more-powerful replacement for a traditional for-loop.
So first here's a for-loop version:
seq1 = (0..2)
seq2 = (0..2)
for x in seq1
for y in seq2
p [x,y] # shorthand for puts [x, y].inspect
end
end
Now let's replace that with more Ruby-idiomatic iterator style, explicitly supplying blocks to be executed (i.e., the do...end blocks):
seq1.each do |x|
seq2.each do |y|
p [x,y]
end
end
So far, so good, you've printed out your cartesian product. Now your assignment asks you to use yield as well. The point of yield is to "yield execution", i.e., pass control to another block of code temporarily (optionally passing one or more arguments).
So, although it's not really necessary for this toy example, instead of directly printing the value like above, you can yield the value, and let the caller supply a block that accepts that value and prints it instead.
That could look like this:
def prod(seq1, seq2)
seq1.each do |x|
seq2.each do |y|
yield [x,y]
end
end
end
Callable like this:
prod (1..2), (1..2) do |prod| p prod end
The yield supplies the product for each run of the inner loop, and the yielded value is printed by the block supplied by the caller.

What exactly do you not understand here? You've made an iterator that yields all possible pairs of elements. If you pass CartProd#each a block, it will be executed a.length*b.length times. It's like having two different for cycles folded one into another in any other programming language.

yield simply passes (yields) control to a block of code that has been passed in as part of the method call. The values after the yield keyword are passed into the block as arguments. Once the block has finished execution it passes back control.
So, in your example you could call #each like this:
CartProd.new([1, 2], [3, 4]).each do |pair|
# control is yielded to this block
p pair
# control is returned at end of block
end
This would output each pair of values.

Accessing a passed block in Ruby

I have a method that accepts a block, lets call it outer. It in turn calls a method that accepts another block, call it inner.
What I would like to have happen is for outer to call inner, passing it a new block which calls the first block.
Here's a concrete example:
class Array
def delete_if_index
self.each_with_index { |element, i| ** A function that removes the element from the array if the block passed to delete_if_index is true }
end
end
['a','b','c','d'].delete_if_index { |i| i.even? }
=> ['b','d']
the block passed to delete_if_index is called by the block passed to each_with_index.
Is this possible in Ruby, and, more broadly, how much access do we have to the block within the function that receives it?

You can wrap a block in another block:
def outer(&block)
if some_condition_is_true
wrapper = lambda {
p 'Do something crazy in this wrapper'
block.call # original block
}
inner(&wrapper)
else
inner(&passed_block)
end
end
def inner(&block)
p 'inner called'
yield
end
outer do
p 'inside block'
sleep 1
end
I'd say opening up an existing block and changing its contents is Doing it WrongTM, maybe continuation-passing would help here? I'd also be wary of passing around blocks with side-effects; I try and keep lambdas deterministic and have actions like deleting stuff in the method body. In a complex application this will likely make debugging a lot easier.

Maybe the example is poorly chosen, but your concrete example is the same as:
[1,2,3,4].reject &:even?
Opening up and modifying a block strikes me as code smell. It'd be difficult to write it in a way that makes the side effects obvious.
Given your example, I think a combination of higher order functions will do what you're looking to solve.
Update: It's not the same, as pointed out in the comments. [1,2,3,4].reject(&:even?) looks at the contents, not the index (and returns [1,3], not [2,4] as it would in the question). The one below is equivalent to the original example, but isn't vary pretty.
[1,2,3,4].each_with_index.reject {|element, index| index.even? }.map(&:first)

So here's a solution to my own question. The passed in block is implicitly converted into a proc which can be received with the & parameter syntax. The proc then exists inside the closure of any nested block, as it is assigned to a local variable in scope, and can be called by it:
class Array
def delete_if_index(&proc)
ary = []
self.each_with_index { |a, i| ary << self[i] unless proc.call(i) }
ary
end
end
[0,1,2,3,4,5,6,7,8,9,10].delete_if_index {|index| index.even?}
=> [1, 3, 5, 7, 9]
Here the block is converted into a proc, and assigned to the variable proc, which is then available within the block passed to each_with_index.

When is the Enumerator::Yielder#yield method useful?

The question "Meaning of the word yield" mentions the Enumerator::Yielder#yield method. I haven't used it before, and wonder under what circumstances it would be useful.
Is it mainly useful when you want to create an infinite list of items, such as the Sieve of Eratosthenes, and when you need to use an external iterator?

"How to create an infinite enumerable of Times?" talks about constructing and lazy iterators, but my favorite usage is wrapping an existing Enumerable with additional functionality (any enumerable, without needing to know what it really is, whether it's infinite or not etc).
A trivial example would be implementing the each_with_index method (or, more generally, with_index method):
module Enumerable
def my_with_index
Enumerator.new do |yielder|
i = 0
self.each do |e|
yielder.yield e, i
i += 1
end
end
end
def my_each_with_index
self.my_with_index.each do |e, i|
yield e, i
end
end
end
[:foo, :bar, :baz].my_each_with_index do |e,i|
puts "#{i}: #{e}"
end
#=>0: foo
#=>1: bar
#=>2: baz
Extending to something not already implemented in the core library, such as cyclically assigning value from a given array to each enumerable element (say, for coloring table rows):
module Enumerable
def with_cycle values
Enumerator.new do |yielder|
self.each do |e|
v = values.shift
yielder.yield e, v
values.push v
end
end
end
end
p (1..10).with_cycle([:red, :green, :blue]).to_a # works with any Enumerable, such as Range
#=>[[1, :red], [2, :green], [3, :blue], [4, :red], [5, :green], [6, :blue], [7, :red], [8, :green], [9, :blue], [10, :red]]
The whole point is that these methods return an Enumerator, which you then combine with the usual Enumerable methods, such as select, map, inject etc.

For example you can use it to construct Rack response bodies inline, without creating classes. An Enumerator can also work "outside-in" - you call Enumerator#each which calls next on the enumerator and returns every value in sequence. For example, you can make a Rack response body returning a sequence of numbers:
run ->(env) {
body = Enumerator.new do |y|
9.times { |i| y.yield(i.to_s) }
end
[200, {'Content-Length' => '9'}, body]
}

Since Mladen mentioned getting other answers, I thought I would give an example of something I just did earlier today while writing an application that will receive data from multiple physical devices, analyze the data, and connect related data (that we see from multiple devices). This is a long-running application, and if I never threw away data (say, at least a day old with no updates), then it would grow infinitely large.
In the past, I would have done something like this:
delete_old_stuff if rand(300) == 0
and accomplish this using random numbers. However, this is not purely deterministic. I know that it will run approximately once every 300 evaluations (i.e. seconds), but it won't be exactly once every 300 times.
What I wrote up earlier looks like this:
counter = Enumerator.new do |y|
a = (0..300)
loop do
a.each do |b|
y.yield b
end
delete_old_stuff
end
end
and I can replace delete_old_stuff if rand(300) == 0 with counter.next
Now, I'm sure there is a more efficient or pre-made way of doing this, but being sparked to play with Enumerator::Yielder#yield by your question and the linked question, this is what I came up with.

It seems to be useful when you have multiple objects you want to enumerate over, but flat_map isn't suitable, and you want to chain the enumeration with another action:
module Enumerable
def count_by
items_grouped_by_criteria = group_by {|object| yield object}
counts = items_grouped_by_criteria.map{|key, array| [key, array.length]}
Hash[counts]
end
end
def calculate_letter_frequencies
each_letter.count_by {|letter| letter}
end
def each_letter
filenames = ["doc/Quickstart", "doc/Coding style"]
# Joining the text of each file into a single string would be memory-intensive
enumerator = Enumerator.new do |yielder|
filenames.each do |filename|
text = File.read(filename)
text.chars.each {|letter| yielder.yield(letter)}
end
end
enumerator
end
calculate_letter_frequencies

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How does Ruby Enumerators chaining work exactly? - ruby

Related

Using range.each vs while-loop to work with sequence of numbers in Ruby

Ruby's factor method explanation

Ruby yield example explanation?

Accessing a passed block in Ruby

When is the Enumerator::Yielder#yield method useful?

Categories

Resources